Merge lp:~schnetter/pocl/main into lp:~pocl/pocl/trunk
- main
- Merge into trunk
Status: | Merged |
---|---|
Merge reported by: | Pekka Jääskeläinen |
Merged at revision: | not available |
Proposed branch: | lp:~schnetter/pocl/main |
Merge into: | lp:~pocl/pocl/trunk |
Diff against target: |
3735 lines (+2544/-357) 56 files modified
.bzrignore (+4/-0) README (+10/-5) configure.ac (+2/-0) examples/Makefile.am (+1/-1) examples/barriers/barriers.c (+1/-1) examples/example1/example1.c (+3/-1) examples/example1/example1_exec.c (+1/-1) examples/example2/example2.c (+1/-1) examples/example2a/example2a.c (+1/-1) examples/forloops/forloops.c (+1/-1) examples/kernel/Makefile.am (+31/-0) examples/kernel/kernel.c (+93/-0) examples/kernel/test_bitselect.cl (+1176/-0) examples/kernel/test_fabs.cl (+178/-0) examples/kernel/test_rotate.cl (+204/-0) examples/loopbarriers/loopbarriers.c (+1/-1) examples/run_all.sh (+1/-1) examples/scalarwave/Makefile.am (+31/-0) examples/scalarwave/scalarwave.c (+233/-0) examples/scalarwave/scalarwave.cl (+94/-0) examples/trig/trig_exec.c (+1/-1) include/_kernel.h (+11/-10) lib/CL/Makefile.am (+1/-0) lib/CL/clCreateBuffer.c (+7/-2) lib/CL/clCreateKernel.c (+10/-8) lib/CL/clEnqueueNDRangeKernel.c (+29/-1) lib/CL/clFinish.c (+31/-0) lib/CL/clGetDeviceIDs.c (+4/-5) lib/CL/clReleaseKernel.c (+6/-1) lib/CL/clReleaseMemObject.c (+1/-1) lib/CL/clSetKernelArg.c (+7/-1) lib/CL/devices/native/native.c (+6/-43) lib/CL/devices/native/native.h (+3/-3) lib/CL/devices/pthread/pthread.c (+27/-54) lib/CL/devices/pthread/pthread.h (+7/-7) lib/CL/pocl_cl.h (+3/-3) lib/kernel/Makefile.am (+5/-3) lib/kernel/arm/Makefile.am (+5/-3) lib/kernel/clz.cl (+1/-1) lib/kernel/get_global_id.c (+8/-4) lib/kernel/get_global_offset.c (+39/-0) lib/kernel/get_global_size.c (+1/-1) lib/kernel/get_group_id.c (+1/-1) lib/kernel/get_local_id.c (+4/-4) lib/kernel/get_local_size.c (+39/-0) lib/kernel/get_num_groups.c (+4/-4) lib/kernel/get_work_dim.c (+31/-0) lib/kernel/popcount.cl (+35/-0) lib/kernel/signbit.cl (+34/-3) lib/kernel/sources.mk (+4/-0) lib/kernel/tce/Makefile.am (+5/-3) lib/kernel/templates.h (+1/-2) lib/kernel/x86_64/Makefile.am (+5/-3) lib/kernel/x86_64/copysign.cl (+0/-169) lib/llvmopencl/Workgroup.cc (+70/-2) tests/testsuite.at (+31/-0) |
To merge this branch: | bzr merge lp:~schnetter/pocl/main |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Pekka Jääskeläinen | Approve | ||
Erik Schnetter | Needs Resubmitting | ||
Review via email:
|
Commit message
Description of the change
This branch passes the same test cases as the trunk: all on Ubuntu, and four failures on Mac OSX.
This branch also introduces a new test case scalarwave, which fails everywhere because of a real problem (that I have not debugged yet). If you want, I can deactivate this new test case, to not introduce a failing test case into the trunk.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Pekka Jääskeläinen (pekka-jaaskelainen) wrote : | # |
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Erik Schnetter (schnetter) wrote : | # |
Indeed; I misunderstood the meaning of global_offset.
-erik
2011/12/6 Pekka Jääskeläinen <email address hidden>
> Review: Needs Fixing
>
> get_global_offset() implementation seems wrong. I think it needs the
> separate values from clEnqueueNDRang
> the context struct, IIRC.
>
> In my opinion having failing tests in the test suite that reproduce
> (reported) bugs is fine but they should be marked with XFAIL (with a bug id
> maybe?) or similar until they are fixed so we won't get confused what is a
> regression and what a known bug reproduction (or if you are into TDD, just
> unimplemented features).
>
> Other than these, seems good to merge for me.
> --
> https:/
> You are the owner of lp:~schnetter/pocl/main.
>
--
Erik Schnetter <email address hidden> http://
- 105. By Erik Schnetter
-
Correct implementation of get_global_offset
Introduce global variables to hold the work_dim and global_offset.
Set these variables.
Correct error in setting num_groups.Disable scalarwave test case (could have used XFAIL instead, but I
don't know how to do this). - 106. By Erik Schnetter
-
Ignore two auto-generated files
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Erik Schnetter (schnetter) wrote : | # |
The current head of the branch corrects the global offset issue.
I didn't implement XFAIL since I didn't know how to do this.
- 107. By Erik Schnetter
-
Implement get_work_dim() properly
- 108. By Erik Schnetter
-
More error checking when setting kernel arguments
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Pekka Jääskeläinen (pekka-jaaskelainen) wrote : | # |
The matrix transpose test case seems to crash here. Some double free? Valgrind should help debugging it.
6: example2: matrix transpose *** glibc detected *** /home/visit0r/
======= Backtrace: =========
/lib/libc.
/lib/libc.
/home/visit0r/
/lib/libpthread
/lib/libc.
- 109. By Erik Schnetter
-
Add debug output to scalarwave example
- 110. By Erik Schnetter
-
Merge from trunk
- 111. By Erik Schnetter
-
Properly free partially allocated memory object if there is an error
- 112. By Erik Schnetter
-
Properly remove pointers from host_buffers list
Remove pointers from host_buffers list when the buffer is freed.
Insert pointers into host_buffers list more efficiently. - 113. By Erik Schnetter
-
Correct memory allocation in kernel handling
Insert kernels into linked list more efficiently.
Temporarily: Don't deallocate kernels (ever), because this would lead
to dangling pointers. Needs more fixing. - 114. By Erik Schnetter
-
Correct malloc problems in handling memory objects
Don't use a linked list to distinguish between user-supplied and
malloc'd pointers. Instead, use the OpenCL memory flags to distinguish
between these.Increase size for /proc/cpuinfo buffer (necessary on my system).
- 115. By Erik Schnetter
-
Merge from trunk
- 116. By Erik Schnetter
-
Merge from trunk
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Erik Schnetter (schnetter) wrote : | # |
The matrix transpose fails because pocl doesn't recognise the local argument as local, and thus doesn't allocate memory for it. This seems Mac OSX specific, and is now bug 901519.
- 117. By Erik Schnetter
-
Merge from trunk: Correct many errors on Mac OSX
- 118. By Erik Schnetter
-
Merge from trunk
- 119. By Erik Schnetter
-
Recommend --enable-debug when configuring
- 120. By Erik Schnetter
-
Simplify code
- 121. By Erik Schnetter
-
Correct signbit implementation
- 122. By Erik Schnetter
-
Add printf prototype
- 123. By Erik Schnetter
-
Add beginnings of testsuite
- 124. By Erik Schnetter
-
Add missing #ifdef cl_khr_fp64
- 125. By Erik Schnetter
-
Ignore autogenerated file
- 126. By Erik Schnetter
-
Add all source files
- 127. By Erik Schnetter
-
"kernel" example: Small improvements
- 128. By Erik Schnetter
-
Correct Scalarwave example. Simplify code.
- 129. By Erik Schnetter
-
Activate Scalarwave test case
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Erik Schnetter (schnetter) wrote : | # |
My branch is now in a good state. All test cases pass, and my new scalar wave example (solving a PDE) is a new test case that also passes.
I also have the beginnings of test cases for the kernel library, but this is still disable because most tests fail (for known reasons, e.g. 3-element vectors don't work, and shift counts are not handled correctly; bug reports have been submitted to llvm).
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Pekka Jääskeläinen (pekka-jaaskelainen) wrote : | # |
I merged this but please fix the memory leak in clReleaseKernel().
Preview Diff
1 | === modified file '.bzrignore' |
2 | --- .bzrignore 2011-10-31 10:43:04 +0000 |
3 | +++ .bzrignore 2011-12-14 15:59:07 +0000 |
4 | @@ -29,10 +29,14 @@ |
5 | examples/barriers/barriers |
6 | examples/example1/example1 |
7 | examples/example2/example2 |
8 | +examples/example2a/example2a |
9 | examples/forloops/forloops |
10 | +examples/loopbarriers/loopbarriers |
11 | examples/standalone/standalone.bc |
12 | examples/standalone/standalone.h |
13 | examples/trig/trig |
14 | +examples/scalarwave/scalarwave |
15 | +examples/kernel/kernel |
16 | scripts/pocl-build |
17 | scripts/pocl-kernel |
18 | scripts/pocl-standalone |
19 | |
20 | === modified file 'README' |
21 | --- README 2011-10-25 18:05:29 +0000 |
22 | +++ README 2011-12-14 15:59:07 +0000 |
23 | @@ -34,11 +34,15 @@ |
24 | autotools, usually installable from distribution packages 'automake', |
25 | 'autoconf', and 'libtool'. |
26 | |
27 | -Once that is done, the usual GNU build system |
28 | - |
29 | - ./configure && make |
30 | - |
31 | -is enough to build pocl. Builds out of source directory are also supported. |
32 | +Once that is done, the usual GNU build commands build pocl. Builds out |
33 | +of source directory are supported. We recommend using |
34 | + |
35 | + ./configure --enable-debug |
36 | + make |
37 | + |
38 | +This will build pocl without optimization, which simplifies debugging. |
39 | +(This does not influence whether pocl will optimize the code that it |
40 | +generates from OpenCL source files.) |
41 | |
42 | The configure script will use the following special environment variables, |
43 | if present or passed in the command line: |
44 | @@ -59,6 +63,7 @@ |
45 | forloops Simple example wth for loops inside the kernel |
46 | standalone Non-executable standalone example (see STANDALONE section) |
47 | trig Example using various trigonometric functions |
48 | + scalarwave Example evolving the scalar wave equation |
49 | |
50 | |
51 | LINKING YOUR PROGRAM WITH POCL |
52 | |
53 | === modified file 'configure.ac' |
54 | --- configure.ac 2011-12-05 10:43:51 +0000 |
55 | +++ configure.ac 2011-12-14 15:59:07 +0000 |
56 | @@ -227,7 +227,9 @@ |
57 | examples/forloops/Makefile |
58 | examples/standalone/Makefile |
59 | examples/trig/Makefile |
60 | + examples/scalarwave/Makefile |
61 | examples/loopbarriers/Makefile |
62 | + examples/kernel/Makefile |
63 | scripts/Makefile |
64 | tests/Makefile |
65 | tests/atlocal |
66 | |
67 | === modified file 'examples/Makefile.am' |
68 | --- examples/Makefile.am 2011-12-05 10:43:51 +0000 |
69 | +++ examples/Makefile.am 2011-12-14 15:59:07 +0000 |
70 | @@ -22,4 +22,4 @@ |
71 | # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN |
72 | # THE SOFTWARE. |
73 | |
74 | -SUBDIRS = example1 example2 example2a barriers forloops standalone trig loopbarriers |
75 | +SUBDIRS = example1 example2 example2a barriers forloops standalone trig scalarwave loopbarriers kernel |
76 | |
77 | === modified file 'examples/barriers/barriers.c' |
78 | --- examples/barriers/barriers.c 2011-11-08 14:24:04 +0000 |
79 | +++ examples/barriers/barriers.c 2011-12-14 15:59:07 +0000 |
80 | @@ -62,7 +62,7 @@ |
81 | |
82 | fclose(source_file); |
83 | |
84 | - context = clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU, |
85 | + context = clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU, |
86 | NULL, NULL, NULL); |
87 | if (context == (cl_context)0) |
88 | return -1; |
89 | |
90 | === modified file 'examples/example1/example1.c' |
91 | --- examples/example1/example1.c 2011-11-08 14:24:04 +0000 |
92 | +++ examples/example1/example1.c 2011-12-14 15:59:07 +0000 |
93 | @@ -39,6 +39,7 @@ |
94 | int source_size; |
95 | cl_float4 *srcA, *srcB; |
96 | cl_float *dst; |
97 | + int ierr; |
98 | int i; |
99 | |
100 | source_file = fopen("example1.cl", "r"); |
101 | @@ -75,7 +76,8 @@ |
102 | srcB[i].w = i; |
103 | } |
104 | |
105 | - exec_dot_product_kernel (source, N, srcA, srcB, dst); |
106 | + ierr = exec_dot_product_kernel (source, N, srcA, srcB, dst); |
107 | + if (ierr) printf ("ERROR\n"); |
108 | |
109 | for (i = 0; i < N; ++i) |
110 | { |
111 | |
112 | === modified file 'examples/example1/example1_exec.c' |
113 | --- examples/example1/example1_exec.c 2011-10-07 15:53:02 +0000 |
114 | +++ examples/example1/example1_exec.c 2011-12-14 15:59:07 +0000 |
115 | @@ -25,7 +25,7 @@ |
116 | cl_int err; |
117 | |
118 | // create the OpenCL context on a GPU device |
119 | - context = clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU, |
120 | + context = clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU, |
121 | NULL, NULL, NULL); |
122 | if (context == (cl_context)0) |
123 | return -1; |
124 | |
125 | === modified file 'examples/example2/example2.c' |
126 | --- examples/example2/example2.c 2011-11-08 14:24:04 +0000 |
127 | +++ examples/example2/example2.c 2011-12-14 15:59:07 +0000 |
128 | @@ -80,7 +80,7 @@ |
129 | input[i * WIDTH + j] = drand48(); |
130 | } |
131 | |
132 | - context = clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU, |
133 | + context = clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU, |
134 | NULL, NULL, NULL); |
135 | if (context == (cl_context)0) |
136 | return -1; |
137 | |
138 | === modified file 'examples/example2a/example2a.c' |
139 | --- examples/example2a/example2a.c 2011-12-05 10:43:51 +0000 |
140 | +++ examples/example2a/example2a.c 2011-12-14 15:59:07 +0000 |
141 | @@ -81,7 +81,7 @@ |
142 | input[i * WIDTH + j] = drand48(); |
143 | } |
144 | |
145 | - context = clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU, |
146 | + context = clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU, |
147 | NULL, NULL, NULL); |
148 | if (context == (cl_context)0) |
149 | return -1; |
150 | |
151 | === modified file 'examples/forloops/forloops.c' |
152 | --- examples/forloops/forloops.c 2011-11-08 14:24:04 +0000 |
153 | +++ examples/forloops/forloops.c 2011-12-14 15:59:07 +0000 |
154 | @@ -62,7 +62,7 @@ |
155 | |
156 | fclose(source_file); |
157 | |
158 | - context = clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU, |
159 | + context = clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU, |
160 | NULL, NULL, NULL); |
161 | if (context == (cl_context)0) |
162 | return -1; |
163 | |
164 | === added directory 'examples/kernel' |
165 | === added file 'examples/kernel/Makefile.am' |
166 | --- examples/kernel/Makefile.am 1970-01-01 00:00:00 +0000 |
167 | +++ examples/kernel/Makefile.am 2011-12-14 15:59:07 +0000 |
168 | @@ -0,0 +1,31 @@ |
169 | +# Process this file with automake to produce Makefile.in (in this, |
170 | +# and all subdirectories). |
171 | +# Makefile.am for pocl/examples/kernel. |
172 | +# |
173 | +# Copyright (c) 2011 Universidad Rey Juan Carlos |
174 | +# |
175 | +# Permission is hereby granted, free of charge, to any person obtaining a copy |
176 | +# of this software and associated documentation files (the "Software"), to deal |
177 | +# in the Software without restriction, including without limitation the rights |
178 | +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell |
179 | +# copies of the Software, and to permit persons to whom the Software is |
180 | +# furnished to do so, subject to the following conditions: |
181 | +# |
182 | +# The above copyright notice and this permission notice shall be included in |
183 | +# all copies or substantial portions of the Software. |
184 | +# |
185 | +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR |
186 | +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, |
187 | +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE |
188 | +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER |
189 | +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, |
190 | +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN |
191 | +# THE SOFTWARE. |
192 | + |
193 | +noinst_PROGRAMS = kernel |
194 | + |
195 | +kernel_SOURCES = kernel.c test_bitselect.cl test_fabs.cl test_rotate.cl |
196 | +kernel_LDADD = ../../lib/CL/libCL.la -lm @PTHREAD_LIBS@ |
197 | +kernel_CFLAGS = -std=c99 @PTHREAD_CFLAGS@ |
198 | + |
199 | +AM_CPPFLAGS = -I$(top_srcdir)/include -U__APPLE__ -DSRCDIR='"$(abs_srcdir)"' |
200 | |
201 | === added file 'examples/kernel/kernel.c' |
202 | --- examples/kernel/kernel.c 1970-01-01 00:00:00 +0000 |
203 | +++ examples/kernel/kernel.c 2011-12-14 15:59:07 +0000 |
204 | @@ -0,0 +1,93 @@ |
205 | +#include <assert.h> |
206 | +#include <stdio.h> |
207 | +#include <stdlib.h> |
208 | + |
209 | +#include <CL/opencl.h> |
210 | + |
211 | + |
212 | +#ifndef SRCDIR |
213 | +# define SRCDIR "." |
214 | +#endif |
215 | + |
216 | + |
217 | + |
218 | +int call_test(char const *const name) |
219 | +{ |
220 | + /* read source code */ |
221 | + char filename[1000]; |
222 | + snprintf(filename, sizeof filename, "%s/%s.cl", SRCDIR, name); |
223 | + FILE *const source_file = fopen(filename, "r"); |
224 | + assert(source_file != NULL && "source file not found"); |
225 | + |
226 | + fseek(source_file, 0, SEEK_END); |
227 | + long const source_size = ftell(source_file); |
228 | + fseek(source_file, 0, SEEK_SET); |
229 | + |
230 | + char source[source_size + 1]; |
231 | + fread(source, source_size, 1, source_file); |
232 | + source[source_size] = '\0'; |
233 | + |
234 | + fclose(source_file); |
235 | + |
236 | + /* call OpenCL program */ |
237 | + cl_context const context = |
238 | + clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU, NULL, NULL, NULL); |
239 | + if (context == 0) return -1; |
240 | + |
241 | + size_t ndevices; |
242 | + clGetContextInfo(context, CL_CONTEXT_DEVICES, 0, NULL, &ndevices); |
243 | + ndevices /= sizeof(cl_device_id); |
244 | + cl_device_id devices[ndevices]; |
245 | + clGetContextInfo(context, CL_CONTEXT_DEVICES, |
246 | + ndevices*sizeof(cl_device_id), devices, NULL); |
247 | + |
248 | + cl_command_queue const cmd_queue = |
249 | + clCreateCommandQueue(context, devices[0], 0, NULL); |
250 | + if (cmd_queue == 0) return -1; |
251 | + |
252 | + char const *sources[] = {source}; |
253 | + cl_program const program = |
254 | + clCreateProgramWithSource(context, 1, sources, NULL, NULL); |
255 | + if (program == 0) return -1; |
256 | + |
257 | + int ierr; |
258 | + ierr = clBuildProgram(program, 0, NULL, NULL, NULL, NULL); |
259 | + if (ierr != CL_SUCCESS) return -1; |
260 | + |
261 | + cl_kernel const kernel = clCreateKernel(program, name, NULL); |
262 | + if (kernel == 0) return -1; |
263 | + |
264 | + size_t global_work_size[] = {1}; |
265 | + size_t local_work_size[]= {1}; |
266 | + ierr = clEnqueueNDRangeKernel(cmd_queue, kernel, 1, NULL, |
267 | + global_work_size, local_work_size, |
268 | + 0, NULL, NULL); |
269 | + if (ierr != CL_SUCCESS) return -1; |
270 | + |
271 | + return 0; |
272 | +} |
273 | + |
274 | + |
275 | + |
276 | +int |
277 | +main(void) |
278 | +{ |
279 | + char const *const tests[] = { |
280 | + "test_bitselect", |
281 | + "test_fabs", |
282 | + "test_rotate", |
283 | + }; |
284 | + int const ntests = sizeof(tests)/sizeof(*tests); |
285 | + for (int i=0; i<ntests; ++i) { |
286 | + printf("Running test #%d %s...\n", i, tests[i]); |
287 | + int ierr; |
288 | + ierr = call_test(tests[i]); |
289 | + if (ierr) { |
290 | + printf("FAIL\n"); |
291 | + return 1; |
292 | + } |
293 | + } |
294 | + |
295 | + printf("DONE\n"); |
296 | + return 0; |
297 | +} |
298 | |
299 | === added file 'examples/kernel/test_bitselect.cl' |
300 | --- examples/kernel/test_bitselect.cl 1970-01-01 00:00:00 +0000 |
301 | +++ examples/kernel/test_bitselect.cl 2011-12-14 15:59:07 +0000 |
302 | @@ -0,0 +1,1176 @@ |
303 | +// TESTING: bitselect |
304 | + |
305 | +#define IMPLEMENT_BODY_G(NAME, BODY, GTYPE, SGTYPE, UGTYPE, SUGTYPE) \ |
306 | + void NAME##_##GTYPE() \ |
307 | + { \ |
308 | + typedef GTYPE gtype; \ |
309 | + typedef SGTYPE sgtype; \ |
310 | + typedef UGTYPE ugtype; \ |
311 | + typedef SUGTYPE sugtype; \ |
312 | + char const *const typename = #GTYPE; \ |
313 | + BODY; \ |
314 | + } |
315 | +#define DEFINE_BODY_G(NAME, EXPR) \ |
316 | + IMPLEMENT_BODY_G(NAME, EXPR, char , char , uchar , uchar ) \ |
317 | + IMPLEMENT_BODY_G(NAME, EXPR, char2 , char , uchar2 , uchar ) \ |
318 | + IMPLEMENT_BODY_G(NAME, EXPR, char3 , char , uchar3 , uchar ) \ |
319 | + IMPLEMENT_BODY_G(NAME, EXPR, char4 , char , uchar4 , uchar ) \ |
320 | + IMPLEMENT_BODY_G(NAME, EXPR, char8 , char , uchar8 , uchar ) \ |
321 | + IMPLEMENT_BODY_G(NAME, EXPR, char16 , char , uchar16 , uchar ) \ |
322 | + IMPLEMENT_BODY_G(NAME, EXPR, uchar , uchar , uchar , uchar ) \ |
323 | + IMPLEMENT_BODY_G(NAME, EXPR, uchar2 , uchar , uchar2 , uchar ) \ |
324 | + IMPLEMENT_BODY_G(NAME, EXPR, uchar3 , uchar , uchar3 , uchar ) \ |
325 | + IMPLEMENT_BODY_G(NAME, EXPR, uchar4 , uchar , uchar4 , uchar ) \ |
326 | + IMPLEMENT_BODY_G(NAME, EXPR, uchar8 , uchar , uchar8 , uchar ) \ |
327 | + IMPLEMENT_BODY_G(NAME, EXPR, uchar16 , uchar , uchar16 , uchar ) \ |
328 | + IMPLEMENT_BODY_G(NAME, EXPR, short , short , ushort , ushort) \ |
329 | + IMPLEMENT_BODY_G(NAME, EXPR, short2 , short , ushort2 , ushort) \ |
330 | + IMPLEMENT_BODY_G(NAME, EXPR, short3 , short , ushort3 , ushort) \ |
331 | + IMPLEMENT_BODY_G(NAME, EXPR, short4 , short , ushort4 , ushort) \ |
332 | + IMPLEMENT_BODY_G(NAME, EXPR, short8 , short , ushort8 , ushort) \ |
333 | + IMPLEMENT_BODY_G(NAME, EXPR, short16 , short , ushort16, ushort) \ |
334 | + IMPLEMENT_BODY_G(NAME, EXPR, ushort , ushort, ushort , ushort) \ |
335 | + IMPLEMENT_BODY_G(NAME, EXPR, ushort2 , ushort, ushort2 , ushort) \ |
336 | + IMPLEMENT_BODY_G(NAME, EXPR, ushort3 , ushort, ushort3 , ushort) \ |
337 | + IMPLEMENT_BODY_G(NAME, EXPR, ushort4 , ushort, ushort4 , ushort) \ |
338 | + IMPLEMENT_BODY_G(NAME, EXPR, ushort8 , ushort, ushort8 , ushort) \ |
339 | + IMPLEMENT_BODY_G(NAME, EXPR, ushort16, ushort, ushort16, ushort) \ |
340 | + IMPLEMENT_BODY_G(NAME, EXPR, int , int , uint , uint ) \ |
341 | + IMPLEMENT_BODY_G(NAME, EXPR, int2 , int , uint2 , uint ) \ |
342 | + IMPLEMENT_BODY_G(NAME, EXPR, int3 , int , uint3 , uint ) \ |
343 | + IMPLEMENT_BODY_G(NAME, EXPR, int4 , int , uint4 , uint ) \ |
344 | + IMPLEMENT_BODY_G(NAME, EXPR, int8 , int , uint8 , uint ) \ |
345 | + IMPLEMENT_BODY_G(NAME, EXPR, int16 , int , uint16 , uint ) \ |
346 | + IMPLEMENT_BODY_G(NAME, EXPR, uint , uint , uint , uint ) \ |
347 | + IMPLEMENT_BODY_G(NAME, EXPR, uint2 , uint , uint2 , uint ) \ |
348 | + IMPLEMENT_BODY_G(NAME, EXPR, uint3 , uint , uint3 , uint ) \ |
349 | + IMPLEMENT_BODY_G(NAME, EXPR, uint4 , uint , uint4 , uint ) \ |
350 | + IMPLEMENT_BODY_G(NAME, EXPR, uint8 , uint , uint8 , uint ) \ |
351 | + IMPLEMENT_BODY_G(NAME, EXPR, uint16 , uint , uint16 , uint ) \ |
352 | + __IF_INT64( \ |
353 | + IMPLEMENT_BODY_G(NAME, EXPR, long , long , ulong , ulong ) \ |
354 | + IMPLEMENT_BODY_G(NAME, EXPR, long2 , long , ulong2 , ulong ) \ |
355 | + IMPLEMENT_BODY_G(NAME, EXPR, long3 , long , ulong3 , ulong ) \ |
356 | + IMPLEMENT_BODY_G(NAME, EXPR, long4 , long , ulong4 , ulong ) \ |
357 | + IMPLEMENT_BODY_G(NAME, EXPR, long8 , long , ulong8 , ulong ) \ |
358 | + IMPLEMENT_BODY_G(NAME, EXPR, long16 , long , ulong16 , ulong ) \ |
359 | + IMPLEMENT_BODY_G(NAME, EXPR, ulong , ulong , ulong , ulong ) \ |
360 | + IMPLEMENT_BODY_G(NAME, EXPR, ulong2 , ulong , ulong2 , ulong ) \ |
361 | + IMPLEMENT_BODY_G(NAME, EXPR, ulong3 , ulong , ulong3 , ulong ) \ |
362 | + IMPLEMENT_BODY_G(NAME, EXPR, ulong4 , ulong , ulong4 , ulong ) \ |
363 | + IMPLEMENT_BODY_G(NAME, EXPR, ulong8 , ulong , ulong8 , ulong ) \ |
364 | + IMPLEMENT_BODY_G(NAME, EXPR, ulong16 , ulong , ulong16 , ulong )) |
365 | + |
366 | +#define CALL_FUNC_G(NAME) \ |
367 | + NAME##_char (); \ |
368 | + NAME##_char2 (); \ |
369 | + NAME##_char3 (); \ |
370 | + NAME##_char4 (); \ |
371 | + NAME##_char8 (); \ |
372 | + NAME##_char16 (); \ |
373 | + NAME##_uchar (); \ |
374 | + NAME##_uchar2 (); \ |
375 | + NAME##_uchar3 (); \ |
376 | + NAME##_uchar4 (); \ |
377 | + NAME##_uchar8 (); \ |
378 | + NAME##_uchar16 (); \ |
379 | + NAME##_short (); \ |
380 | + NAME##_short2 (); \ |
381 | + NAME##_short3 (); \ |
382 | + NAME##_short4 (); \ |
383 | + NAME##_short8 (); \ |
384 | + NAME##_short16 (); \ |
385 | + NAME##_ushort (); \ |
386 | + NAME##_ushort2 (); \ |
387 | + NAME##_ushort3 (); \ |
388 | + NAME##_ushort4 (); \ |
389 | + NAME##_ushort8 (); \ |
390 | + NAME##_ushort16(); \ |
391 | + NAME##_int (); \ |
392 | + NAME##_int2 (); \ |
393 | + NAME##_int3 (); \ |
394 | + NAME##_int4 (); \ |
395 | + NAME##_int8 (); \ |
396 | + NAME##_int16 (); \ |
397 | + NAME##_uint (); \ |
398 | + NAME##_uint2 (); \ |
399 | + NAME##_uint3 (); \ |
400 | + NAME##_uint4 (); \ |
401 | + NAME##_uint8 (); \ |
402 | + NAME##_uint16 (); \ |
403 | + __IF_INT64( \ |
404 | + NAME##_long (); \ |
405 | + NAME##_long2 (); \ |
406 | + NAME##_long3 (); \ |
407 | + NAME##_long4 (); \ |
408 | + NAME##_long8 (); \ |
409 | + NAME##_long16 (); \ |
410 | + NAME##_ulong (); \ |
411 | + NAME##_ulong2 (); \ |
412 | + NAME##_ulong3 (); \ |
413 | + NAME##_ulong4 (); \ |
414 | + NAME##_ulong8 (); \ |
415 | + NAME##_ulong16 ();) |
416 | + |
417 | + |
418 | + |
419 | +#define is_signed(T) ((T)-1 < (T)+1) |
420 | +#define is_floating(T) ((T)0.1 > (T)0.0) |
421 | +#define count_bits(T) (CHAR_BIT * sizeof(T)) |
422 | + |
423 | +DEFINE_BODY_G |
424 | +(test_bitselect, |
425 | + ({ |
426 | + _cl_static_assert(sgtype, !is_floating(sgtype)); |
427 | + uint const randoms[] = { |
428 | + 0x00000000, |
429 | + 0x00000001, |
430 | + 0x7fffffff, |
431 | + 0x80000000, |
432 | + 0xfffffffe, |
433 | + 0xffffffff, |
434 | + 0x01010101, |
435 | + 0x80808080, |
436 | + 0x55555555, |
437 | + 0xaaaaaaaa, |
438 | + 116127149, |
439 | + 331473970, |
440 | + 3314285513, |
441 | + 1531519032, |
442 | + 3871781304, |
443 | + 723260354, |
444 | + 3734992454, |
445 | + 3048883544, |
446 | + 424075405, |
447 | + 3760586679, |
448 | + 364071113, |
449 | + 2212396745, |
450 | + 3026460845, |
451 | + 2062923368, |
452 | + 3945483116, |
453 | + 774301702, |
454 | + 2010645213, |
455 | + 353497300, |
456 | + 2240089293, |
457 | + 645959945, |
458 | + 2929402380, |
459 | + 3641106046, |
460 | + 3731530029, |
461 | + 3788502454, |
462 | + 3990366079, |
463 | + 3532452335, |
464 | + 3231247251, |
465 | + 123690193, |
466 | + 418692672, |
467 | + 4146745661, |
468 | + 4170087687, |
469 | + 3915754726, |
470 | + 2052700648, |
471 | + 1748863847, |
472 | + 276568793, |
473 | + 364266289, |
474 | + 24718041, |
475 | + 3775186845, |
476 | + 935438421, |
477 | + 3070232227, |
478 | + 558364671, |
479 | + 2318351214, |
480 | + 17943242, |
481 | + 1796864907, |
482 | + 727165514, |
483 | + 223478118, |
484 | + 2448924107, |
485 | + 496915291, |
486 | + 3372891854, |
487 | + 361433487, |
488 | + 3273766229, |
489 | + 251831411, |
490 | + 432661417, |
491 | + 772908669, |
492 | + 289792578, |
493 | + 4150526710, |
494 | + 4157662725, |
495 | + 2594757327, |
496 | + 3052388893, |
497 | + 3842089578, |
498 | + 3467269013, |
499 | + 510187125, |
500 | + 2596093643, |
501 | + 398042620, |
502 | + 4272455984, |
503 | + 3711648086, |
504 | + 2120827851, |
505 | + 77269246, |
506 | + 2168059317, |
507 | + 2750549452, |
508 | + 1712682330, |
509 | + 2486520097, |
510 | + 625173621, |
511 | + 1632501477, |
512 | + 2935468416, |
513 | + 980045574, |
514 | + 3080136685, |
515 | + 4291385683, |
516 | + 1900746145, |
517 | + 3343063222, |
518 | + 3737266887, |
519 | + 3349055009, |
520 | + 3557165116, |
521 | + 847440541, |
522 | + 1195278641, |
523 | + 313889830, |
524 | + 622790046, |
525 | + 326637691, |
526 | + 663570370, |
527 | + 662327410, |
528 | + 923839117, |
529 | + 3091793818, |
530 | + 3563062752, |
531 | + 1864236072, |
532 | + 4251970867, |
533 | + 2259486024, |
534 | + 2512789432, |
535 | + 4278284968, |
536 | + 244581614, |
537 | + 247706675, |
538 | + 3268622648, |
539 | + 3758387026, |
540 | + 206893256, |
541 | + 2892198447, |
542 | + 3585538105, |
543 | + 2484801188, |
544 | + 1063964031, |
545 | + 3712657639, |
546 | + 23179627, |
547 | + 1732005357, |
548 | + 2522016557, |
549 | + 1058341654, |
550 | + 1580368080, |
551 | + 1890361257, |
552 | + 1167428989, |
553 | + 2600065453, |
554 | + 1547136389, |
555 | + 945856727, |
556 | + 2005682606, |
557 | + 3399854093, |
558 | + 2619154565, |
559 | + 2207015138, |
560 | + 2836381097, |
561 | + 612928932, |
562 | + 1537934908, |
563 | + 897756908, |
564 | + 1142275256, |
565 | + 1106163744, |
566 | + 3209429231, |
567 | + 3317761168, |
568 | + 2815958850, |
569 | + 1282374282, |
570 | + 3861163766, |
571 | + 2547903564, |
572 | + 3139840265, |
573 | + 587243656, |
574 | + 3261127556, |
575 | + 3955999184, |
576 | + 2061849860, |
577 | + 3778058575, |
578 | + 259659645, |
579 | + 935157504, |
580 | + 3294850933, |
581 | + 2164603733, |
582 | + 3772888022, |
583 | + 732201413, |
584 | + 3677934092, |
585 | + 321204420, |
586 | + 509807651, |
587 | + 3626474557, |
588 | + 284622251, |
589 | + 3655952885, |
590 | + 1512028769, |
591 | + 1102588652, |
592 | + 2700179235, |
593 | + 4167405174, |
594 | + 2672050627, |
595 | + 3410780487, |
596 | + 4153733940, |
597 | + 2459759898, |
598 | + 568792515, |
599 | + 1081882827, |
600 | + 3211871042, |
601 | + 799411732, |
602 | + 2101993855, |
603 | + 3415550991, |
604 | + 3872737342, |
605 | + 4168312654, |
606 | + 1889019671, |
607 | + 4247531636, |
608 | + 2442118552, |
609 | + 3024016549, |
610 | + 1041817509, |
611 | + 141773691, |
612 | + 28033810, |
613 | + 4034097901, |
614 | + 1532981240, |
615 | + 2593712697, |
616 | + 2751535537, |
617 | + 269072724, |
618 | + 3363560906, |
619 | + 3555817938, |
620 | + 611297346, |
621 | + 366972507, |
622 | + 788151801, |
623 | + 3990920857, |
624 | + 1611303958, |
625 | + 3353102293, |
626 | + 1334246396, |
627 | + 1114446428, |
628 | + 3491128109, |
629 | + 2922751152, |
630 | + 3053407478, |
631 | + 2897830841, |
632 | + 176546593, |
633 | + 3184221063, |
634 | + 37923477, |
635 | + 1692128510, |
636 | + 165719856, |
637 | + 1795746307, |
638 | + 2422422413, |
639 | + 253227286, |
640 | + 2188522595, |
641 | + 582156087, |
642 | + 2342528685, |
643 | + 2080142547, |
644 | + 1928462563, |
645 | + 2713927482, |
646 | + 1944972771, |
647 | + 2534268146, |
648 | + 830798003, |
649 | + 1653357460, |
650 | + 291743070, |
651 | + 593771532, |
652 | + 2941865444, |
653 | + 855254640, |
654 | + 2401129822, |
655 | + 2420945774, |
656 | + 2447532144, |
657 | + 1137540092, |
658 | + 1296659939, |
659 | + 3252539825, |
660 | + 1165427708, |
661 | + 3251476781, |
662 | + 2597490804, |
663 | + 2518198923, |
664 | + 1196242486, |
665 | + 3646082981, |
666 | + 1347758965, |
667 | + 3824891532, |
668 | + 2959519286, |
669 | + 1523237529, |
670 | + 2910666174, |
671 | + 3226637035, |
672 | + 2116458903, |
673 | + 1076998092, |
674 | + 4222762545, |
675 | + 3061300520, |
676 | + 4189298288, |
677 | + 3943996060, |
678 | + 3129210496, |
679 | + 3826669630, |
680 | + 4235952488, |
681 | + 2624429853, |
682 | + 2522766390, |
683 | + 4137227001, |
684 | + 3846448057, |
685 | + 1893377487, |
686 | + 3658784739, |
687 | + 2368074586, |
688 | + 170547540, |
689 | + 520741120, |
690 | + 2662229630, |
691 | + 4265731754, |
692 | + 1379762094, |
693 | + 3395502906, |
694 | + 2242123335, |
695 | + 1960965916, |
696 | + 561815223, |
697 | + 2687853297, |
698 | + 4051050259, |
699 | + 1845906614, |
700 | + 3725623071, |
701 | + 1857706909, |
702 | + 2487006596, |
703 | + 1925919247, |
704 | + 2796536825, |
705 | + 3499954730, |
706 | + 2173320675, |
707 | + 3416676849, |
708 | + 3637473517, |
709 | + 340951464, |
710 | + 4152841543, |
711 | + 3747544606, |
712 | + 2659955417, |
713 | + 1695145107, |
714 | + 3117280269, |
715 | + 826143012, |
716 | + 3867179892, |
717 | + 4269349771, |
718 | + 1002613766, |
719 | + 3842086144, |
720 | + 1431990957, |
721 | + 2466205499, |
722 | + 653575141, |
723 | + 293530756, |
724 | + 2318035308, |
725 | + 3728576309, |
726 | + 1697894989, |
727 | + 2955143882, |
728 | + 2109912287, |
729 | + 2764187839, |
730 | + 1805490664, |
731 | + 672567480, |
732 | + 1374741155, |
733 | + 1662665091, |
734 | + 3551530257, |
735 | + 350283994, |
736 | + 685023916, |
737 | + 1887748803, |
738 | + 1386316091, |
739 | + 185708823, |
740 | + 3106823178, |
741 | + 3014109065, |
742 | + 3823816879, |
743 | + 2213358313, |
744 | + 2696977340, |
745 | + 4075569311, |
746 | + 365089277, |
747 | + 3466850767, |
748 | + 312392153, |
749 | + 1065191758, |
750 | + 2405243644, |
751 | + 3174745999, |
752 | + 3617861250, |
753 | + 867192904, |
754 | + 1046475095, |
755 | + 1888985494, |
756 | + 1127140157, |
757 | + 61671281, |
758 | + 128055546, |
759 | + 2332619657, |
760 | + 993669439, |
761 | + 2145370329, |
762 | + 1462433204, |
763 | + 74990676, |
764 | + 2898191247, |
765 | + 3601586977, |
766 | + 794604597, |
767 | + 3597643629, |
768 | + 4282141339, |
769 | + 251591051, |
770 | + 84943504, |
771 | + 2016044077, |
772 | + 946823499, |
773 | + 648214756, |
774 | + 2530104367, |
775 | + 4254219656, |
776 | + 1974542801, |
777 | + 53097687, |
778 | + 157109688, |
779 | + 299310673, |
780 | + 2866882336, |
781 | + 3335682769, |
782 | + 2583612755, |
783 | + 4114730718, |
784 | + 740387484, |
785 | + 986157357, |
786 | + 1140355266, |
787 | + 2825639379, |
788 | + 1198731547, |
789 | + 1521261313, |
790 | + 1204836445, |
791 | + 4294274455, |
792 | + 2215732661, |
793 | + 1369520150, |
794 | + 1515223958, |
795 | + 2428295267, |
796 | + 1945985266, |
797 | + 2168529560, |
798 | + 3791933294, |
799 | + 4021389338, |
800 | + 713695045, |
801 | + 4254483898, |
802 | + 3795986293, |
803 | + 1347498014, |
804 | + 1746051095, |
805 | + 1364967734, |
806 | + 206265390, |
807 | + 3940088473, |
808 | + 1867270033, |
809 | + 3893545471, |
810 | + 3545819698, |
811 | + 2573105187, |
812 | + 3859595967, |
813 | + 2823745089, |
814 | + 1293424244, |
815 | + 3948799370, |
816 | + 1524394803, |
817 | + 3807487752, |
818 | + 4055830971, |
819 | + 3124609223, |
820 | + 119357574, |
821 | + 1490516894, |
822 | + 3799908122, |
823 | + 1700941394, |
824 | + 80878888, |
825 | + 2719184407, |
826 | + 3603450215, |
827 | + 27225525, |
828 | + 1413638246, |
829 | + 3350206268, |
830 | + 2643568519, |
831 | + 801305037, |
832 | + 1341902999, |
833 | + 1420459209, |
834 | + 968648411, |
835 | + 1826125841, |
836 | + 2619721007, |
837 | + 537879916, |
838 | + 860253620, |
839 | + 586683700, |
840 | + 827412286, |
841 | + 2724526294, |
842 | + 1019678576, |
843 | + 3998975225, |
844 | + 339789397, |
845 | + 863181640, |
846 | + 970475690, |
847 | + 2737385140, |
848 | + 322021174, |
849 | + 4084948327, |
850 | + 80691950, |
851 | + 1702782677, |
852 | + 1266230197, |
853 | + 1100861683, |
854 | + 3123418948, |
855 | + 258978579, |
856 | + 3217833394, |
857 | + 1780903315, |
858 | + 1345341356, |
859 | + 2927579299, |
860 | + 931392918, |
861 | + 9404798, |
862 | + 83278219, |
863 | + 2470714323, |
864 | + 640357359, |
865 | + 2169696414, |
866 | + 496463525, |
867 | + 4127940882, |
868 | + 2965369765, |
869 | + 4136333330, |
870 | + 1159134689, |
871 | + 1798163043, |
872 | + 4097403856, |
873 | + 4284804850, |
874 | + 3165524545, |
875 | + 2765224926, |
876 | + 931350022, |
877 | + 1171636623, |
878 | + 845799406, |
879 | + 709853915, |
880 | + 2348457302, |
881 | + 3343956878, |
882 | + 2438786363, |
883 | + 175730452, |
884 | + 598587430, |
885 | + 2744955366, |
886 | + 447049527, |
887 | + 1252796590, |
888 | + 3044128900, |
889 | + 812683575, |
890 | + 3721040746, |
891 | + 3404688504, |
892 | + 2674021068, |
893 | + 959056069, |
894 | + 322162714, |
895 | + 2008064015, |
896 | + 3758321185, |
897 | + 2877937989, |
898 | + 778007512, |
899 | + 3502772435, |
900 | + 3084124565, |
901 | + 111844966, |
902 | + 248248909, |
903 | + 22147113, |
904 | + 2506501875, |
905 | + 1430033847, |
906 | + 1690841637, |
907 | + 2999017281, |
908 | + 3658748205, |
909 | + 1632773934, |
910 | + 4177069459, |
911 | + 3187781304, |
912 | + 1182255965, |
913 | + 4121685939, |
914 | + 300554973, |
915 | + 2854502901, |
916 | + 642657206, |
917 | + 1504346771, |
918 | + 128405037, |
919 | + 2163092164, |
920 | + 1091806675, |
921 | + 1144089805, |
922 | + 54479906, |
923 | + 505543118, |
924 | + 2844153548, |
925 | + 1010229282, |
926 | + 2961721580, |
927 | + 4235612700, |
928 | + 3508832243, |
929 | + 1409461040, |
930 | + 2568735295, |
931 | + 1191284023, |
932 | + 2220949766, |
933 | + 2605559386, |
934 | + 706551146, |
935 | + 3452279268, |
936 | + 2372892169, |
937 | + 2360210709, |
938 | + 3228881405, |
939 | + 2987444766, |
940 | + 1187314024, |
941 | + 908783041, |
942 | + 144096950, |
943 | + 1915948100, |
944 | + 2171208878, |
945 | + 420772043, |
946 | + 793209353, |
947 | + 359527746, |
948 | + 625018196, |
949 | + 1195796799, |
950 | + 2079388581, |
951 | + 864869238, |
952 | + 765565143, |
953 | + 1069647859, |
954 | + 3857355469, |
955 | + 2436437044, |
956 | + 238157644, |
957 | + 1612883577, |
958 | + 1911189891, |
959 | + 2070273440, |
960 | + 384222456, |
961 | + 1186369477, |
962 | + 2844794758, |
963 | + 3435869876, |
964 | + 1486894286, |
965 | + 4062343990, |
966 | + 440437688, |
967 | + 306253241, |
968 | + 3650751868, |
969 | + 2695961920, |
970 | + 3920128930, |
971 | + 3921419250, |
972 | + 502951143, |
973 | + 311093469, |
974 | + 2708936678, |
975 | + 36677206, |
976 | + 3473343884, |
977 | + 577655290, |
978 | + 3795127787, |
979 | + 1448118037, |
980 | + 436359554, |
981 | + 2051970204, |
982 | + 2644913053, |
983 | + 2492587228, |
984 | + 3125803824, |
985 | + 150160619, |
986 | + 1725373463, |
987 | + 2221292372, |
988 | + 2580064663, |
989 | + 1330289179, |
990 | + 2700556441, |
991 | + 1327212925, |
992 | + 651999045, |
993 | + 2089310372, |
994 | + 3221246949, |
995 | + 4148251434, |
996 | + 4267892623, |
997 | + 897583443, |
998 | + 1051813251, |
999 | + 2131903377, |
1000 | + 4121163297, |
1001 | + 4128279241, |
1002 | + 1634689556, |
1003 | + 3369895626, |
1004 | + 1121895497, |
1005 | + 3158192590, |
1006 | + 4290462018, |
1007 | + 3447288838, |
1008 | + 4035505534, |
1009 | + 2945114940, |
1010 | + 1556028368, |
1011 | + 4235061319, |
1012 | + 1535570089, |
1013 | + 2144940257, |
1014 | + 1961364931, |
1015 | + 2509075082, |
1016 | + 804411045, |
1017 | + 2290609740, |
1018 | + 1076471626, |
1019 | + 3254493188, |
1020 | + 4284011230, |
1021 | + 923006875, |
1022 | + 3722016670, |
1023 | + 2981439178, |
1024 | + 2038308778, |
1025 | + 1755166344, |
1026 | + 488581856, |
1027 | + 2624361425, |
1028 | + 1298790575, |
1029 | + 3550671725, |
1030 | + 1845109437, |
1031 | + 2047411775, |
1032 | + 2488464246, |
1033 | + 1391825885, |
1034 | + 2340290304, |
1035 | + 3623879917, |
1036 | + 217171099, |
1037 | + 3698905333, |
1038 | + 2718846041, |
1039 | + 73731529, |
1040 | + 2053405441, |
1041 | + 2770197347, |
1042 | + 2983996080, |
1043 | + 2612966141, |
1044 | + 2187183079, |
1045 | + 2796212469, |
1046 | + 3797629169, |
1047 | + 1788932364, |
1048 | + 17748377, |
1049 | + 627297271, |
1050 | + 3689459731, |
1051 | + 3311799950, |
1052 | + 4263162298, |
1053 | + 4016852324, |
1054 | + 3136750215, |
1055 | + 1725824049, |
1056 | + 2844064064, |
1057 | + 2059159211, |
1058 | + 3182127070, |
1059 | + 470655679, |
1060 | + 1166949584, |
1061 | + 2425843062, |
1062 | + 219908183, |
1063 | + 161770982, |
1064 | + 2394961157, |
1065 | + 999226372, |
1066 | + 2367624166, |
1067 | + 76287885, |
1068 | + 1110832227, |
1069 | + 3358123709, |
1070 | + 1504127646, |
1071 | + 49596774, |
1072 | + 1296560019, |
1073 | + 2320978173, |
1074 | + 1163934122, |
1075 | + 1631947491, |
1076 | + 2702852639, |
1077 | + 3856755518, |
1078 | + 2562943123, |
1079 | + 991330989, |
1080 | + 993726248, |
1081 | + 2133737192, |
1082 | + 20974150, |
1083 | + 3808389889, |
1084 | + 2447868340, |
1085 | + 2434828629, |
1086 | + 3344419509, |
1087 | + 4076789444, |
1088 | + 1446054487, |
1089 | + 3815933708, |
1090 | + 3644670988, |
1091 | + 3175898122, |
1092 | + 3057844745, |
1093 | + 559106380, |
1094 | + 1840065631, |
1095 | + 3020573012, |
1096 | + 3203040371, |
1097 | + 997381925, |
1098 | + 2563312032, |
1099 | + 815510593, |
1100 | + 121805231, |
1101 | + 1047507862, |
1102 | + 1841403695, |
1103 | + 1563170561, |
1104 | + 1644198099, |
1105 | + 3470882735, |
1106 | + 627296501, |
1107 | + 3006157508, |
1108 | + 383648566, |
1109 | + 3136652449, |
1110 | + 2252034149, |
1111 | + 1749861990, |
1112 | + 956381402, |
1113 | + 3299624735, |
1114 | + 2798395931, |
1115 | + 270054444, |
1116 | + 3757564211, |
1117 | + 2933717597, |
1118 | + 1080178310, |
1119 | + 1367392714, |
1120 | + 1135266342, |
1121 | + 2642448461, |
1122 | + 1067554284, |
1123 | + 3694982777, |
1124 | + 3594374699, |
1125 | + 4170301369, |
1126 | + 3593401570, |
1127 | + 2298071009, |
1128 | + 1561680798, |
1129 | + 2788490866, |
1130 | + 1757829499, |
1131 | + 8819607, |
1132 | + 2453686068, |
1133 | + 3458682663, |
1134 | + 1614888171, |
1135 | + 2327536307, |
1136 | + 13960177, |
1137 | + 125752716, |
1138 | + 2312371195, |
1139 | + 1515197240, |
1140 | + 189747227, |
1141 | + 666988376, |
1142 | + 1401118738, |
1143 | + 986465965, |
1144 | + 242793663, |
1145 | + 1830586663, |
1146 | + 1603054176, |
1147 | + 391536104, |
1148 | + 1403125754, |
1149 | + 4021998614, |
1150 | + 157985039, |
1151 | + 966292223, |
1152 | + 2476444819, |
1153 | + 3261614719, |
1154 | + 3888752449, |
1155 | + 2300656903, |
1156 | + 1138839559, |
1157 | + 1227396086, |
1158 | + 1029493665, |
1159 | + 2138482384, |
1160 | + 2182525175, |
1161 | + 1437393012, |
1162 | + 2758514342, |
1163 | + 1394715363, |
1164 | + 242430786, |
1165 | + 4026759135, |
1166 | + 379455166, |
1167 | + 3454852592, |
1168 | + 1128257576, |
1169 | + 513994046, |
1170 | + 2437643547, |
1171 | + 1851772774, |
1172 | + 1096918785, |
1173 | + 2537378072, |
1174 | + 2020382559, |
1175 | + 1306056753, |
1176 | + 519939769, |
1177 | + 2477462755, |
1178 | + 2962076712, |
1179 | + 2856059355, |
1180 | + 111272034, |
1181 | + 2363778749, |
1182 | + 3031510224, |
1183 | + 297098997, |
1184 | + 2716928589, |
1185 | + 1988398361, |
1186 | + 3715685207, |
1187 | + 1158387390, |
1188 | + 3239718824, |
1189 | + 214276640, |
1190 | + 1240159361, |
1191 | + 302800084, |
1192 | + 258391670, |
1193 | + 3118615408, |
1194 | + 1789752935, |
1195 | + 935790045, |
1196 | + 1678444383, |
1197 | + 3645357112, |
1198 | + 1752731774, |
1199 | + 1211889371, |
1200 | + 2432949496, |
1201 | + 1983838022, |
1202 | + 2563701701, |
1203 | + 3235972690, |
1204 | + 2732559614, |
1205 | + 4173627589, |
1206 | + 918129740, |
1207 | + 3528101943, |
1208 | + 945287787, |
1209 | + 783593046, |
1210 | + 1687101911, |
1211 | + 4265659819, |
1212 | + 1625936204, |
1213 | + 419423123, |
1214 | + 404748783, |
1215 | + 174814826, |
1216 | + 561306387, |
1217 | + 441376876, |
1218 | + 3649973873, |
1219 | + 1191532754, |
1220 | + 493829681, |
1221 | + 462640703, |
1222 | + 3037639795, |
1223 | + 4234288143, |
1224 | + 787992128, |
1225 | + 354556603, |
1226 | + 1391557094, |
1227 | + 1227150157, |
1228 | + 25592400, |
1229 | + 3032298621, |
1230 | + 1655829692, |
1231 | + 1736544192, |
1232 | + 2936173068, |
1233 | + 1867683432, |
1234 | + 3284761215, |
1235 | + 2988749127, |
1236 | + 62083315, |
1237 | + 3675433852, |
1238 | + 1134152479, |
1239 | + 2537382040, |
1240 | + 1147996351, |
1241 | + 1287284159, |
1242 | + 1889610942, |
1243 | + 3549411223, |
1244 | + 2634772335, |
1245 | + 1621708033, |
1246 | + 3268420142, |
1247 | + 2635222095, |
1248 | + 2856377255, |
1249 | + 3703296204, |
1250 | + 45831019, |
1251 | + 1997278369, |
1252 | + 1472530726, |
1253 | + 4202051236, |
1254 | + 1958581642, |
1255 | + 1899513707, |
1256 | + 1642075765, |
1257 | + 217373156, |
1258 | + 1177071505, |
1259 | + 2179831909, |
1260 | + 1894821896, |
1261 | + 375785474, |
1262 | + 140181353, |
1263 | + 2743987480, |
1264 | + 123627609, |
1265 | + 3644816362, |
1266 | + 4244769687, |
1267 | + 4053481902, |
1268 | + 4272740073, |
1269 | + 1701735471, |
1270 | + 1799303028, |
1271 | + 2810175160, |
1272 | + 1531107068, |
1273 | + 3059813822, |
1274 | + 4125025775, |
1275 | + 1932301928, |
1276 | + 358163550, |
1277 | + 1246286294, |
1278 | + 1901878857, |
1279 | + 2449370117, |
1280 | + 4061706076, |
1281 | + 2875797072, |
1282 | + 1661522553, |
1283 | + 543545982, |
1284 | + 300448222, |
1285 | + 4019581644, |
1286 | + 3197346443, |
1287 | + 731278538, |
1288 | + 457112622, |
1289 | + 669625172, |
1290 | + 2548620393, |
1291 | + 2931934447, |
1292 | + 2318225955, |
1293 | + 427149964, |
1294 | + 1097556601, |
1295 | + 3585697077, |
1296 | + 1901391738, |
1297 | + 3019912350, |
1298 | + 4193989774, |
1299 | + 1411691495, |
1300 | + 2549773310, |
1301 | + 3130489018, |
1302 | + 739444137, |
1303 | + 1953561922, |
1304 | + 228589899, |
1305 | + 974825144, |
1306 | + 1873934953, |
1307 | + 918502475, |
1308 | + 4020302125, |
1309 | + 2103082289, |
1310 | + 1474428456, |
1311 | + 269315616, |
1312 | + 3376419786, |
1313 | + 2903506696, |
1314 | + 169344159, |
1315 | + 4151327830, |
1316 | + 2861975985, |
1317 | + 1583628545, |
1318 | + 337656074, |
1319 | + 2381206238, |
1320 | + 1346357469, |
1321 | + 3316549550, |
1322 | + 1188140897, |
1323 | + 928463634, |
1324 | + 120466083, |
1325 | + 1048016215, |
1326 | + 2053770646, |
1327 | + 3729204448, |
1328 | + 3630812747, |
1329 | + 3421817962, |
1330 | + 1471357089, |
1331 | + 2971633393, |
1332 | + 2721366758, |
1333 | + 3977792328, |
1334 | + 2771228423, |
1335 | + 258029855, |
1336 | + 325097628, |
1337 | + 2816869331, |
1338 | + 228010778, |
1339 | + 1815596248, |
1340 | + 2677647806, |
1341 | + 4069826588, |
1342 | + 2009464559, |
1343 | + 4003870353, |
1344 | + 2558198381, |
1345 | + 823508134, |
1346 | + 256895388, |
1347 | + 130455482, |
1348 | + 4107398577, |
1349 | + 2446165146, |
1350 | + 3086759840, |
1351 | + 3128842794, |
1352 | + 236454548, |
1353 | + 3740649072, |
1354 | + 1049081391, |
1355 | + 3780795812, |
1356 | + 1964380357, |
1357 | + 3900635454, |
1358 | + 1941196066, |
1359 | + 1143285596, |
1360 | + 1276856333, |
1361 | + 2919547816, |
1362 | + 2947639569, |
1363 | + 1889305089, |
1364 | + 2386910172, |
1365 | + 2685680362, |
1366 | + 2042792556, |
1367 | + 2780968041, |
1368 | + 976912013, |
1369 | + 3562274424, |
1370 | + 2336140155, |
1371 | + 3464857244, |
1372 | + 1108365812, |
1373 | + 1201566469, |
1374 | + 707126700, |
1375 | + 4047776595, |
1376 | + 1289380202, |
1377 | + 1231913128, |
1378 | + 2819729319, |
1379 | + 537908270, |
1380 | + 3802355886, |
1381 | + 2004615093, |
1382 | + 2947614997, |
1383 | + 4192189156, |
1384 | + 2809733754, |
1385 | + 3082820238, |
1386 | + 2758499499, |
1387 | + 1004612882, |
1388 | + 1102702383, |
1389 | + 1862546275, |
1390 | + 3170345990, |
1391 | + 883739952, |
1392 | + 1641198615, |
1393 | + 957782688, |
1394 | + 1503652889, |
1395 | + 2210400768, |
1396 | + 2002162781, |
1397 | + 1553086024, |
1398 | + 2591721606, |
1399 | + 3830165160, |
1400 | + 4181044959, |
1401 | + 2735782270, |
1402 | + 3825677158, |
1403 | + 143739895, |
1404 | + 771193452, |
1405 | + 35990560, |
1406 | + 1014009970, |
1407 | + 20768744, |
1408 | + 1785268932, |
1409 | + 1424740580, |
1410 | + 1620237280, |
1411 | + 848157259, |
1412 | + 3808893671, |
1413 | + 2746756110, |
1414 | + 3903639825, |
1415 | + 1822084165, |
1416 | + 2891666588, |
1417 | + 3853186896, |
1418 | + 4248495212, |
1419 | + 1178592425, |
1420 | + 455721495, |
1421 | + 1848821934, |
1422 | + 1558397701, |
1423 | + 133397899, |
1424 | + 1845531767, |
1425 | + 2798312897, |
1426 | + 1471176399, |
1427 | + 1743248506, |
1428 | + 2229972777, |
1429 | + 1290369879, |
1430 | + 3579075953, |
1431 | + 309034994, |
1432 | + 929728690, |
1433 | + 3841454719, |
1434 | + 3031753515, |
1435 | + 3606461413, |
1436 | + 2412281758, |
1437 | + 2993123515, |
1438 | + }; |
1439 | + int const nrandoms = sizeof(randoms) / sizeof(*randoms); |
1440 | + |
1441 | + int const bits = count_bits(sgtype); |
1442 | + for (int iter=0; iter<nrandoms; ++iter) { |
1443 | + typedef union { |
1444 | + gtype v; |
1445 | + sgtype s[16]; |
1446 | + } Tvec; |
1447 | + Tvec sel, left, right, res; |
1448 | + int vecsize = vec_step(gtype); |
1449 | + for (int n=0; n<vecsize; ++n) { |
1450 | + sel.s[n] = randoms[(iter+n ) % nrandoms]; |
1451 | + left.s[n] = randoms[(iter+n+20) % nrandoms]; |
1452 | + right.s[n] = randoms[(iter+n+40) % nrandoms]; |
1453 | + if (bits>32) { |
1454 | + sel.s[n] = (sel.s[n] << (bits/2)) | randoms[(iter+n+100) % nrandoms]; |
1455 | + left.s[n] = (left.s[n] << (bits/2)) | randoms[(iter+n+120) % nrandoms]; |
1456 | + right.s[n] = (right.s[n] << (bits/2)) | randoms[(iter+n+140) % nrandoms]; |
1457 | + } |
1458 | + } |
1459 | + res.v = bitselect(left.v, right.v, sel.v); |
1460 | + bool equal = true; |
1461 | + for (int n=0; n<vecsize; ++n) { |
1462 | + equal = equal && ((res.s[n] & ~sel.s[n]) == (left.s[n] & ~sel.s[n])); |
1463 | + equal = equal && ((res.s[n] & sel.s[n]) == (right.s[n] & sel.s[n])); |
1464 | + } |
1465 | + if (!equal) { |
1466 | + printf("FAIL: bitselect type=%s a=0x%08x b=0x%08x c=0x%08x c=0x%08x\n", |
1467 | + typename, |
1468 | + (uint)left.s[0], (uint)right.s[0], (uint)sel.s[0], |
1469 | + (uint)res.s[0]); |
1470 | + } |
1471 | + } |
1472 | + }) |
1473 | + ) |
1474 | + |
1475 | +void test_bitselect() |
1476 | +{ |
1477 | + CALL_FUNC_G(test_bitselect) |
1478 | +} |
1479 | |
1480 | === added file 'examples/kernel/test_fabs.cl' |
1481 | --- examples/kernel/test_fabs.cl 1970-01-01 00:00:00 +0000 |
1482 | +++ examples/kernel/test_fabs.cl 2011-12-14 15:59:07 +0000 |
1483 | @@ -0,0 +1,178 @@ |
1484 | +// TESTING: fabs |
1485 | +// TESTING: signbit |
1486 | +// TESTING: copysign |
1487 | + |
1488 | +#define IMPLEMENT_BODY_V(NAME, BODY, VTYPE, STYPE, JTYPE, SJTYPE) \ |
1489 | + void NAME##_##VTYPE() \ |
1490 | + { \ |
1491 | + typedef VTYPE vtype; \ |
1492 | + typedef STYPE stype; \ |
1493 | + typedef JTYPE jtype; \ |
1494 | + typedef SJTYPE sjtype; \ |
1495 | + char const *const typename = #VTYPE; \ |
1496 | + BODY; \ |
1497 | + } |
1498 | +#define DEFINE_BODY_V(NAME, EXPR) \ |
1499 | + IMPLEMENT_BODY_V(NAME, EXPR, float , float , int , int ) \ |
1500 | + IMPLEMENT_BODY_V(NAME, EXPR, float2 , float , int2 , int ) \ |
1501 | + IMPLEMENT_BODY_V(NAME, EXPR, float3 , float , int3 , int ) \ |
1502 | + IMPLEMENT_BODY_V(NAME, EXPR, float4 , float , int4 , int ) \ |
1503 | + IMPLEMENT_BODY_V(NAME, EXPR, float8 , float , int8 , int ) \ |
1504 | + IMPLEMENT_BODY_V(NAME, EXPR, float16 , float , int16 , int ) \ |
1505 | + __IF_FP64( \ |
1506 | + IMPLEMENT_BODY_V(NAME, EXPR, double , double, long , long) \ |
1507 | + IMPLEMENT_BODY_V(NAME, EXPR, double2 , double, long2 , long) \ |
1508 | + IMPLEMENT_BODY_V(NAME, EXPR, double3 , double, long3 , long) \ |
1509 | + IMPLEMENT_BODY_V(NAME, EXPR, double4 , double, long4 , long) \ |
1510 | + IMPLEMENT_BODY_V(NAME, EXPR, double8 , double, long8 , long) \ |
1511 | + IMPLEMENT_BODY_V(NAME, EXPR, double16, double, long16, long)) |
1512 | + |
1513 | +#define CALL_FUNC_V(NAME) \ |
1514 | + NAME##_float (); \ |
1515 | + NAME##_float2 (); \ |
1516 | + NAME##_float3 (); \ |
1517 | + NAME##_float4 (); \ |
1518 | + NAME##_float8 (); \ |
1519 | + NAME##_float16 (); \ |
1520 | + __IF_FP64( \ |
1521 | + NAME##_double (); \ |
1522 | + NAME##_double2 (); \ |
1523 | + NAME##_double3 (); \ |
1524 | + NAME##_double4 (); \ |
1525 | + NAME##_double8 (); \ |
1526 | + NAME##_double16();) |
1527 | + |
1528 | + |
1529 | + |
1530 | +#define is_signed(T) ((T)-1 < (T)+1) |
1531 | +#define is_floating(T) ((T)0.1 > (T)0.0) |
1532 | +#define count_bits(T) (CHAR_BIT * sizeof(T)) |
1533 | + |
1534 | +DEFINE_BODY_V |
1535 | +(test_fabs, |
1536 | + ({ |
1537 | + _cl_static_assert(stype, is_floating(stype)); |
1538 | + float const values[] = { |
1539 | + 0.0f, |
1540 | + 0.1f, |
1541 | + 0.9f, |
1542 | + 1.0f, |
1543 | + 1.1f, |
1544 | + 10.0f, |
1545 | + 1000000.0f, |
1546 | + 1000000000000.0f, |
1547 | + MAXFLOAT, |
1548 | + HUGE_VALF, |
1549 | + INFINITY, |
1550 | + /* NAN, a nan has no specific sign */ |
1551 | + FLT_MAX, |
1552 | + FLT_MIN, |
1553 | + FLT_EPSILON, |
1554 | + }; |
1555 | + int const nvalues = sizeof(values) / sizeof(*values); |
1556 | + int ninputs = 1; |
1557 | +#ifdef cl_khr_fp64 |
1558 | + double const dvalues[] = { |
1559 | + 0.0, |
1560 | + 0.1, |
1561 | + 0.9, |
1562 | + 1.0, |
1563 | + 1.1, |
1564 | + 10.0, |
1565 | + 1000000.0, |
1566 | + 1000000000000.0, |
1567 | + 1000000000000000000000000.0, |
1568 | + HUGE_VAL, |
1569 | + INFINITY, |
1570 | + /* NAN, a nan has no specific sign */ |
1571 | + DBL_MAX, |
1572 | + DBL_MIN, |
1573 | + DBL_EPSILON, |
1574 | + }; |
1575 | + int const ndvalues = sizeof(dvalues) / sizeof(*dvalues); |
1576 | + ++ninputs; |
1577 | +#endif |
1578 | + |
1579 | + for (int input=0; input<ninputs; ++input) { |
1580 | + for (int iter=0; iter<nvalues; ++iter) { |
1581 | + for (int sign=-1; sign<=+1; sign+=2) { |
1582 | + typedef union { |
1583 | + vtype v; |
1584 | + stype s[16]; |
1585 | + } Tvec; |
1586 | + Tvec val, good, val2; |
1587 | + int vecsize = vec_step(vtype); |
1588 | + for (int n=0; n<vecsize; ++n) { |
1589 | + if (input==0) { |
1590 | + val.s[n] = sign * values[(iter+n) % nvalues]; |
1591 | + good.s[n] = values[(iter+n) % nvalues]; |
1592 | + val2.s[n] = values[(iter+n+1) % nvalues]; |
1593 | + } else { |
1594 | +#ifdef cl_khr_fp64 |
1595 | + val.s[n] = sign * dvalues[(iter+n) % ndvalues]; |
1596 | + good.s[n] = dvalues[(iter+n) % ndvalues]; |
1597 | + val2.s[n] = dvalues[(iter+n+1) % ndvalues]; |
1598 | +#endif |
1599 | + } |
1600 | + } |
1601 | + Tvec res; |
1602 | + bool equal; |
1603 | + typedef union { |
1604 | + stype s; |
1605 | + sjtype sj; |
1606 | + } S; |
1607 | + typedef union { |
1608 | + jtype v; |
1609 | + sjtype s[16]; |
1610 | + } Jvec; |
1611 | + /* fabs */ |
1612 | + res.v = fabs(val.v); |
1613 | + equal = true; |
1614 | + for (int n=0; n<vecsize; ++n) { |
1615 | + S r, g; |
1616 | + r.s = res.s[n]; |
1617 | + g.s = good.s[n]; |
1618 | + equal = equal && r.sj == g.sj; |
1619 | + } |
1620 | + if (!equal) { |
1621 | + printf("FAIL: fabs type=%s val=%.17g res=%.17g\n", |
1622 | + typename, val.s[0], res.s[0]); |
1623 | + } |
1624 | + /* signbit */ |
1625 | + Jvec ires; |
1626 | + ires.v = signbit(val.v); |
1627 | + equal = true; |
1628 | + for (int n=0; n<vecsize; ++n) { |
1629 | + equal = equal && ires.s[n] == (sign>0 ? 0 : vecsize==1 ? +1 : -1); |
1630 | + } |
1631 | + if (!equal) { |
1632 | + printf("FAIL: signbit type=%s val=%.17g res=%d\n", |
1633 | + typename, val.s[0], (int)ires.s[0]); |
1634 | + } |
1635 | + /* copysign */ |
1636 | + for (int sign2=-1; sign2<=+1; sign2+=2) { |
1637 | + res.v = copysign(val.v, (stype)sign2*val2.v); |
1638 | + equal = true; |
1639 | + for (int n=0; n<vecsize; ++n) { |
1640 | + S r, g; |
1641 | + r.s = res.s[n]; |
1642 | + g.s = sign2*good.s[n]; |
1643 | + equal = equal && r.sj == g.sj; |
1644 | + } |
1645 | + if (!equal) { |
1646 | + for (int n=0; n<vecsize; ++n) { |
1647 | + printf("FAIL: copysign type=%s val=%.17g sign=%.17g res=%.17g\n", |
1648 | + typename, val.s[n], sign2*val2.s[n], res.s[n]); |
1649 | + } |
1650 | + } |
1651 | + } |
1652 | + } |
1653 | + } |
1654 | + } |
1655 | + }) |
1656 | + ) |
1657 | + |
1658 | +void test_fabs() |
1659 | +{ |
1660 | + CALL_FUNC_V(test_fabs) |
1661 | +} |
1662 | |
1663 | === added file 'examples/kernel/test_rotate.cl' |
1664 | --- examples/kernel/test_rotate.cl 1970-01-01 00:00:00 +0000 |
1665 | +++ examples/kernel/test_rotate.cl 2011-12-14 15:59:07 +0000 |
1666 | @@ -0,0 +1,204 @@ |
1667 | +// TESTING: << |
1668 | +// TESTING: >> |
1669 | +// TESTING: rotate |
1670 | + |
1671 | +#define IMPLEMENT_BODY_G(NAME, BODY, GTYPE, SGTYPE, UGTYPE, SUGTYPE) \ |
1672 | + void NAME##_##GTYPE() \ |
1673 | + { \ |
1674 | + typedef GTYPE gtype; \ |
1675 | + typedef SGTYPE sgtype; \ |
1676 | + typedef UGTYPE ugtype; \ |
1677 | + typedef SUGTYPE sugtype; \ |
1678 | + char const *const typename = #GTYPE; \ |
1679 | + BODY; \ |
1680 | + } |
1681 | +#define DEFINE_BODY_G(NAME, EXPR) \ |
1682 | + IMPLEMENT_BODY_G(NAME, EXPR, char , char , uchar , uchar ) \ |
1683 | + IMPLEMENT_BODY_G(NAME, EXPR, char2 , char , uchar2 , uchar ) \ |
1684 | + IMPLEMENT_BODY_G(NAME, EXPR, char3 , char , uchar3 , uchar ) \ |
1685 | + IMPLEMENT_BODY_G(NAME, EXPR, char4 , char , uchar4 , uchar ) \ |
1686 | + IMPLEMENT_BODY_G(NAME, EXPR, char8 , char , uchar8 , uchar ) \ |
1687 | + IMPLEMENT_BODY_G(NAME, EXPR, char16 , char , uchar16 , uchar ) \ |
1688 | + IMPLEMENT_BODY_G(NAME, EXPR, uchar , uchar , uchar , uchar ) \ |
1689 | + IMPLEMENT_BODY_G(NAME, EXPR, uchar2 , uchar , uchar2 , uchar ) \ |
1690 | + IMPLEMENT_BODY_G(NAME, EXPR, uchar3 , uchar , uchar3 , uchar ) \ |
1691 | + IMPLEMENT_BODY_G(NAME, EXPR, uchar4 , uchar , uchar4 , uchar ) \ |
1692 | + IMPLEMENT_BODY_G(NAME, EXPR, uchar8 , uchar , uchar8 , uchar ) \ |
1693 | + IMPLEMENT_BODY_G(NAME, EXPR, uchar16 , uchar , uchar16 , uchar ) \ |
1694 | + IMPLEMENT_BODY_G(NAME, EXPR, short , short , ushort , ushort) \ |
1695 | + IMPLEMENT_BODY_G(NAME, EXPR, short2 , short , ushort2 , ushort) \ |
1696 | + IMPLEMENT_BODY_G(NAME, EXPR, short3 , short , ushort3 , ushort) \ |
1697 | + IMPLEMENT_BODY_G(NAME, EXPR, short4 , short , ushort4 , ushort) \ |
1698 | + IMPLEMENT_BODY_G(NAME, EXPR, short8 , short , ushort8 , ushort) \ |
1699 | + IMPLEMENT_BODY_G(NAME, EXPR, short16 , short , ushort16, ushort) \ |
1700 | + IMPLEMENT_BODY_G(NAME, EXPR, ushort , ushort, ushort , ushort) \ |
1701 | + IMPLEMENT_BODY_G(NAME, EXPR, ushort2 , ushort, ushort2 , ushort) \ |
1702 | + IMPLEMENT_BODY_G(NAME, EXPR, ushort3 , ushort, ushort3 , ushort) \ |
1703 | + IMPLEMENT_BODY_G(NAME, EXPR, ushort4 , ushort, ushort4 , ushort) \ |
1704 | + IMPLEMENT_BODY_G(NAME, EXPR, ushort8 , ushort, ushort8 , ushort) \ |
1705 | + IMPLEMENT_BODY_G(NAME, EXPR, ushort16, ushort, ushort16, ushort) \ |
1706 | + IMPLEMENT_BODY_G(NAME, EXPR, int , int , uint , uint ) \ |
1707 | + IMPLEMENT_BODY_G(NAME, EXPR, int2 , int , uint2 , uint ) \ |
1708 | + IMPLEMENT_BODY_G(NAME, EXPR, int3 , int , uint3 , uint ) \ |
1709 | + IMPLEMENT_BODY_G(NAME, EXPR, int4 , int , uint4 , uint ) \ |
1710 | + IMPLEMENT_BODY_G(NAME, EXPR, int8 , int , uint8 , uint ) \ |
1711 | + IMPLEMENT_BODY_G(NAME, EXPR, int16 , int , uint16 , uint ) \ |
1712 | + IMPLEMENT_BODY_G(NAME, EXPR, uint , uint , uint , uint ) \ |
1713 | + IMPLEMENT_BODY_G(NAME, EXPR, uint2 , uint , uint2 , uint ) \ |
1714 | + IMPLEMENT_BODY_G(NAME, EXPR, uint3 , uint , uint3 , uint ) \ |
1715 | + IMPLEMENT_BODY_G(NAME, EXPR, uint4 , uint , uint4 , uint ) \ |
1716 | + IMPLEMENT_BODY_G(NAME, EXPR, uint8 , uint , uint8 , uint ) \ |
1717 | + IMPLEMENT_BODY_G(NAME, EXPR, uint16 , uint , uint16 , uint ) \ |
1718 | + __IF_INT64( \ |
1719 | + IMPLEMENT_BODY_G(NAME, EXPR, long , long , ulong , ulong ) \ |
1720 | + IMPLEMENT_BODY_G(NAME, EXPR, long2 , long , ulong2 , ulong ) \ |
1721 | + IMPLEMENT_BODY_G(NAME, EXPR, long3 , long , ulong3 , ulong ) \ |
1722 | + IMPLEMENT_BODY_G(NAME, EXPR, long4 , long , ulong4 , ulong ) \ |
1723 | + IMPLEMENT_BODY_G(NAME, EXPR, long8 , long , ulong8 , ulong ) \ |
1724 | + IMPLEMENT_BODY_G(NAME, EXPR, long16 , long , ulong16 , ulong ) \ |
1725 | + IMPLEMENT_BODY_G(NAME, EXPR, ulong , ulong , ulong , ulong ) \ |
1726 | + IMPLEMENT_BODY_G(NAME, EXPR, ulong2 , ulong , ulong2 , ulong ) \ |
1727 | + IMPLEMENT_BODY_G(NAME, EXPR, ulong3 , ulong , ulong3 , ulong ) \ |
1728 | + IMPLEMENT_BODY_G(NAME, EXPR, ulong4 , ulong , ulong4 , ulong ) \ |
1729 | + IMPLEMENT_BODY_G(NAME, EXPR, ulong8 , ulong , ulong8 , ulong ) \ |
1730 | + IMPLEMENT_BODY_G(NAME, EXPR, ulong16 , ulong , ulong16 , ulong )) |
1731 | + |
1732 | +#define CALL_FUNC_G(NAME) \ |
1733 | + NAME##_char (); \ |
1734 | + NAME##_char2 (); \ |
1735 | + NAME##_char3 (); \ |
1736 | + NAME##_char4 (); \ |
1737 | + NAME##_char8 (); \ |
1738 | + NAME##_char16 (); \ |
1739 | + NAME##_uchar (); \ |
1740 | + NAME##_uchar2 (); \ |
1741 | + NAME##_uchar3 (); \ |
1742 | + NAME##_uchar4 (); \ |
1743 | + NAME##_uchar8 (); \ |
1744 | + NAME##_uchar16 (); \ |
1745 | + NAME##_short (); \ |
1746 | + NAME##_short2 (); \ |
1747 | + NAME##_short3 (); \ |
1748 | + NAME##_short4 (); \ |
1749 | + NAME##_short8 (); \ |
1750 | + NAME##_short16 (); \ |
1751 | + NAME##_ushort (); \ |
1752 | + NAME##_ushort2 (); \ |
1753 | + NAME##_ushort3 (); \ |
1754 | + NAME##_ushort4 (); \ |
1755 | + NAME##_ushort8 (); \ |
1756 | + NAME##_ushort16(); \ |
1757 | + NAME##_int (); \ |
1758 | + NAME##_int2 (); \ |
1759 | + NAME##_int3 (); \ |
1760 | + NAME##_int4 (); \ |
1761 | + NAME##_int8 (); \ |
1762 | + NAME##_int16 (); \ |
1763 | + NAME##_uint (); \ |
1764 | + NAME##_uint2 (); \ |
1765 | + NAME##_uint3 (); \ |
1766 | + NAME##_uint4 (); \ |
1767 | + NAME##_uint8 (); \ |
1768 | + NAME##_uint16 (); \ |
1769 | + __IF_INT64( \ |
1770 | + NAME##_long (); \ |
1771 | + NAME##_long2 (); \ |
1772 | + NAME##_long3 (); \ |
1773 | + NAME##_long4 (); \ |
1774 | + NAME##_long8 (); \ |
1775 | + NAME##_long16 (); \ |
1776 | + NAME##_ulong (); \ |
1777 | + NAME##_ulong2 (); \ |
1778 | + NAME##_ulong3 (); \ |
1779 | + NAME##_ulong4 (); \ |
1780 | + NAME##_ulong8 (); \ |
1781 | + NAME##_ulong16 ();) |
1782 | + |
1783 | + |
1784 | + |
1785 | +#define is_signed(T) ((T)-1 < (T)+1) |
1786 | +#define is_floating(T) ((T)0.1 > (T)0.0) |
1787 | +#define count_bits(T) (CHAR_BIT * sizeof(T)) |
1788 | + |
1789 | +DEFINE_BODY_G |
1790 | +(test_rotate, |
1791 | + ({ |
1792 | + _cl_static_assert(sgtype, !is_floating(sgtype)); |
1793 | + int patterns[] = {0x01, 0x80, 0x77, 0xee}; |
1794 | + for (int p=0; p<4; ++p) { |
1795 | + int const bits = count_bits(sgtype); |
1796 | + int array[bits]; |
1797 | + for (int b=0; b<bits; ++b) { |
1798 | + array[b] = !!(patterns[p] & (1 << (b & 7))); |
1799 | + } |
1800 | + int vecsize = vec_step(gtype); |
1801 | + typedef union { |
1802 | + gtype v; |
1803 | + sgtype s[16]; |
1804 | + } Tvec; |
1805 | + for (int shiftbase=0; shiftbase<=bits; ++shiftbase) { |
1806 | + for (int shiftoffset=0; shiftoffset<(vecsize==1?1:4); ++shiftoffset) { |
1807 | + Tvec shift; |
1808 | + Tvec val; |
1809 | + Tvec shl, shr, rot; |
1810 | + for (int n=0; n<vecsize; ++n) { |
1811 | + shift.s[n] = shiftbase + n*shiftoffset; |
1812 | + val.s[n] = 0; |
1813 | + shl.s[n] = 0; |
1814 | + shr.s[n] = 0; |
1815 | + rot.s[n] = 0; |
1816 | + for (int b=bits; b>=0; --b) { |
1817 | + int bl = b - (shift.s[n] & (bits-1)); |
1818 | + int br = b + (shift.s[n] & (bits-1)); |
1819 | + int sign = is_signed(sgtype) ? array[bits-1] : 0; |
1820 | + val.s[n] = (val.s[n] << 1) | array[b]; |
1821 | + shl.s[n] = (shl.s[n] << 1) | (bl < 0 ? 0 : array[bl]); |
1822 | + shr.s[n] = (shr.s[n] << 1) | (br >= bits ? sign : array[br]); |
1823 | + rot.s[n] = (rot.s[n] << 1) | array[bl & (bits-1)]; |
1824 | + } |
1825 | + } |
1826 | + Tvec res; |
1827 | + bool equal; |
1828 | + /* shift left */ |
1829 | + res.v = val.v << shift.v; |
1830 | + equal = true; |
1831 | + for (int n=0; n<vecsize; ++n) { |
1832 | + equal = equal && res.s[n] == shl.s[n]; |
1833 | + } |
1834 | + if (!equal) { |
1835 | + printf("FAIL: shift left (<<) type=%s pattern=0x%x shiftbase=%d shiftoffset=%d res=0x%08x good=0x%08x\n", |
1836 | + typename, patterns[p], shiftbase, shiftoffset, |
1837 | + (uint)res.s[0], (uint)shl.s[0]); |
1838 | + } |
1839 | + /* shift right */ |
1840 | + res.v = val.v >> shift.v; |
1841 | + equal = true; |
1842 | + for (int n=0; n<vecsize; ++n) { |
1843 | + equal = equal && res.s[n] == shr.s[n]; |
1844 | + } |
1845 | + if (!equal) { |
1846 | + printf("FAIL: shift right (>>) type=%s pattern=0x%x shiftbase=%d shiftoffset=%d res=0x%08x good=0x%08x\n", |
1847 | + typename, patterns[p], shiftbase, shiftoffset, |
1848 | + (uint)res.s[0], (uint)shr.s[0]); |
1849 | + } |
1850 | + /* rotate */ |
1851 | + res.v = rotate(val.v, shift.v); |
1852 | + equal = true; |
1853 | + for (int n=0; n<vecsize; ++n) { |
1854 | + equal = equal && res.s[n] == rot.s[n]; |
1855 | + } |
1856 | + if (!equal) { |
1857 | + printf("FAIL: rotate type=%s pattern=0x%x shiftbase=%d shiftoffset=%d res=0x%08x good=0x%08x\n", |
1858 | + typename, patterns[p], shiftbase, shiftoffset, |
1859 | + (uint)res.s[0], (uint)rot.s[0]); |
1860 | + } |
1861 | + } |
1862 | + } |
1863 | + } |
1864 | + }) |
1865 | + ) |
1866 | + |
1867 | +void test_rotate() |
1868 | +{ |
1869 | + CALL_FUNC_G(test_rotate) |
1870 | +} |
1871 | |
1872 | === modified file 'examples/loopbarriers/loopbarriers.c' |
1873 | --- examples/loopbarriers/loopbarriers.c 2011-11-07 18:21:03 +0000 |
1874 | +++ examples/loopbarriers/loopbarriers.c 2011-12-14 15:59:07 +0000 |
1875 | @@ -57,7 +57,7 @@ |
1876 | |
1877 | fclose(source_file); |
1878 | |
1879 | - context = clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU, |
1880 | + context = clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU, |
1881 | NULL, NULL, NULL); |
1882 | if (context == (cl_context)0) |
1883 | return -1; |
1884 | |
1885 | === modified file 'examples/run_all.sh' |
1886 | --- examples/run_all.sh 2011-11-08 17:02:15 +0000 |
1887 | +++ examples/run_all.sh 2011-12-14 15:59:07 +0000 |
1888 | @@ -2,7 +2,7 @@ |
1889 | # |
1890 | # Executes all the examples in a row. |
1891 | # |
1892 | -TESTS="example1 example2 barriers forloops trig" |
1893 | +TESTS="example1 example2 barriers forloops trig scalarwave" |
1894 | |
1895 | for dname in ${TESTS}; |
1896 | do |
1897 | |
1898 | === added directory 'examples/scalarwave' |
1899 | === added file 'examples/scalarwave/Makefile.am' |
1900 | --- examples/scalarwave/Makefile.am 1970-01-01 00:00:00 +0000 |
1901 | +++ examples/scalarwave/Makefile.am 2011-12-14 15:59:07 +0000 |
1902 | @@ -0,0 +1,31 @@ |
1903 | +# Process this file with automake to produce Makefile.in (in this, |
1904 | +# and all subdirectories). |
1905 | +# Makefile.am for pocl/examples/scalarwave. |
1906 | +# |
1907 | +# Copyright (c) 2011 Universidad Rey Juan Carlos |
1908 | +# |
1909 | +# Permission is hereby granted, free of charge, to any person obtaining a copy |
1910 | +# of this software and associated documentation files (the "Software"), to deal |
1911 | +# in the Software without restriction, including without limitation the rights |
1912 | +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell |
1913 | +# copies of the Software, and to permit persons to whom the Software is |
1914 | +# furnished to do so, subject to the following conditions: |
1915 | +# |
1916 | +# The above copyright notice and this permission notice shall be included in |
1917 | +# all copies or substantial portions of the Software. |
1918 | +# |
1919 | +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR |
1920 | +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, |
1921 | +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE |
1922 | +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER |
1923 | +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, |
1924 | +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN |
1925 | +# THE SOFTWARE. |
1926 | + |
1927 | +noinst_PROGRAMS = scalarwave |
1928 | + |
1929 | +scalarwave_SOURCES = scalarwave.c scalarwave.cl |
1930 | +scalarwave_LDADD = ../../lib/CL/libCL.la -lm @PTHREAD_LIBS@ |
1931 | +scalarwave_CFLAGS = -std=c99 @PTHREAD_CFLAGS@ |
1932 | + |
1933 | +AM_CPPFLAGS = -I$(top_srcdir)/include -U__APPLE__ -DSRCDIR='"$(abs_srcdir)"' |
1934 | |
1935 | === added file 'examples/scalarwave/scalarwave.c' |
1936 | --- examples/scalarwave/scalarwave.c 1970-01-01 00:00:00 +0000 |
1937 | +++ examples/scalarwave/scalarwave.c 2011-12-14 15:59:07 +0000 |
1938 | @@ -0,0 +1,233 @@ |
1939 | +/* scalarwave - Scalar wave evolution */ |
1940 | + |
1941 | +#define _BSD_SOURCE // define M_PI |
1942 | + |
1943 | +#include <assert.h> |
1944 | +#include <math.h> |
1945 | +#include <stdio.h> |
1946 | +#include <stdlib.h> |
1947 | + |
1948 | +#include <CL/opencl.h> |
1949 | + |
1950 | + |
1951 | + |
1952 | +#define GRID_GRANULARITY 1 // TODO 2 |
1953 | + |
1954 | +typedef struct grid_t { |
1955 | + cl_double dt; // time step |
1956 | + cl_double dx, dy, dz; // resolution |
1957 | + cl_int ai, aj, ak; // allocated size |
1958 | + cl_int ni, nj, nk; // used size |
1959 | +} grid_t; |
1960 | + |
1961 | + |
1962 | + |
1963 | +int |
1964 | +exec_scalarwave_kernel(char const *const program_source, |
1965 | + cl_double *const phi, |
1966 | + cl_double const *const phi_p, |
1967 | + cl_double const *const phi_p_p, |
1968 | + grid_t const *const grid) |
1969 | +{ |
1970 | + static int initialised = 0; |
1971 | + static cl_context context; |
1972 | + static cl_command_queue cmd_queue; |
1973 | + static cl_program program; |
1974 | + static cl_kernel kernel; |
1975 | + |
1976 | + if (!initialised) { |
1977 | + initialised = 1; |
1978 | + |
1979 | + context = |
1980 | + clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU, NULL, NULL, NULL); |
1981 | + if (!context) return -1; |
1982 | + |
1983 | + size_t ndevices; |
1984 | + clGetContextInfo(context, CL_CONTEXT_DEVICES, 0, NULL, &ndevices); |
1985 | + ndevices /= sizeof(cl_device_id); |
1986 | + cl_device_id devices[ndevices]; |
1987 | + clGetContextInfo(context, CL_CONTEXT_DEVICES, |
1988 | + ndevices*sizeof(cl_device_id), devices, NULL); |
1989 | + |
1990 | + cmd_queue = |
1991 | + clCreateCommandQueue(context, devices[0], 0, NULL); |
1992 | + if (!cmd_queue) return -1; |
1993 | + |
1994 | + program = |
1995 | + clCreateProgramWithSource(context, 1, (const char**)&program_source, |
1996 | + NULL, NULL); |
1997 | + if (!program) return -1; |
1998 | + |
1999 | + int ierr; |
2000 | + ierr = clBuildProgram(program, 0, NULL, NULL, NULL, NULL); |
2001 | + if (ierr) return -1; |
2002 | + |
2003 | + kernel = clCreateKernel(program, "scalarwave", NULL); |
2004 | + if (!kernel) return -1; |
2005 | + |
2006 | + } |
2007 | + |
2008 | + size_t const npoints = grid->ai * grid->aj * grid->ak; |
2009 | + cl_mem const mem_phi = |
2010 | + clCreateBuffer(context, 0, |
2011 | + npoints*sizeof(*phi), NULL, NULL); |
2012 | + if (!mem_phi) return -1; |
2013 | + cl_mem const mem_phi_p = |
2014 | + clCreateBuffer(context, CL_MEM_COPY_HOST_PTR, |
2015 | + npoints*sizeof(*phi_p), phi_p, NULL); |
2016 | + if (!mem_phi_p) return -1; |
2017 | + cl_mem const mem_phi_p_p = |
2018 | + clCreateBuffer(context, CL_MEM_COPY_HOST_PTR, |
2019 | + npoints*sizeof(*phi_p_p), phi_p_p, NULL); |
2020 | + if (!mem_phi_p_p) return -1; |
2021 | + cl_mem const mem_grid = |
2022 | + clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, |
2023 | + sizeof(*grid), grid, NULL); |
2024 | + if (!mem_grid) return -1; |
2025 | + |
2026 | + int ierr; |
2027 | + ierr = clSetKernelArg(kernel, 0, sizeof(cl_mem), &mem_phi); |
2028 | + if (ierr) return -1; |
2029 | + ierr = clSetKernelArg(kernel, 1, sizeof(cl_mem), &mem_phi_p); |
2030 | + if (ierr) return -1; |
2031 | + ierr = clSetKernelArg(kernel, 2, sizeof(cl_mem), &mem_phi_p_p); |
2032 | + if (ierr) return -1; |
2033 | + ierr = clSetKernelArg(kernel, 3, sizeof(cl_mem), &mem_grid); |
2034 | + if (ierr) return -1; |
2035 | + |
2036 | + size_t const global_work_size[3] = |
2037 | + {grid->ni, grid->nj, grid->nk}; |
2038 | + size_t const local_work_size[3] = |
2039 | + {GRID_GRANULARITY, GRID_GRANULARITY, GRID_GRANULARITY}; |
2040 | + |
2041 | + ierr = clEnqueueNDRangeKernel(cmd_queue, kernel, 3, NULL, |
2042 | + global_work_size, local_work_size, |
2043 | + 0, NULL, NULL); |
2044 | + if (ierr) return -1; |
2045 | + |
2046 | + ierr = clFinish(cmd_queue); |
2047 | + if (ierr) return -1; |
2048 | + |
2049 | + ierr = clEnqueueReadBuffer(cmd_queue, mem_phi, CL_TRUE, |
2050 | + 0, npoints*sizeof(*phi), phi, |
2051 | + 0, NULL, NULL); |
2052 | + if (ierr) return -1; |
2053 | + |
2054 | + clReleaseMemObject(mem_phi); |
2055 | + clReleaseMemObject(mem_phi_p); |
2056 | + clReleaseMemObject(mem_phi_p_p); |
2057 | + clReleaseMemObject(mem_grid); |
2058 | + /* clReleaseKernel(kernel); */ |
2059 | + /* clReleaseProgram(program); */ |
2060 | + /* clReleaseCommandQueue(cmd_queue); */ |
2061 | + /* clReleaseContext(context); */ |
2062 | + |
2063 | + return 0; |
2064 | +} |
2065 | + |
2066 | + |
2067 | + |
2068 | +#define ALPHA 0.5 // CFL factor |
2069 | +#define NT 4 // time steps |
2070 | +#define NX 17 // grid size |
2071 | + |
2072 | + |
2073 | + |
2074 | +// Round up to next multiple of GRID_GRANULARITY |
2075 | +static int roundup(int const nx) |
2076 | +{ |
2077 | + return (nx + GRID_GRANULARITY-1) / GRID_GRANULARITY * GRID_GRANULARITY; |
2078 | +} |
2079 | + |
2080 | +#ifndef SRCDIR |
2081 | +# define SRCDIR "." |
2082 | +#endif |
2083 | + |
2084 | +int |
2085 | +main(void) |
2086 | +{ |
2087 | + FILE *const source_file = fopen(SRCDIR "/scalarwave.cl", "r"); |
2088 | + assert(source_file != NULL && "scalarwave.cl not found!"); |
2089 | + |
2090 | + fseek(source_file, 0, SEEK_END); |
2091 | + size_t const source_size = ftell(source_file); |
2092 | + fseek(source_file, 0, SEEK_SET); |
2093 | + |
2094 | + char source[source_size + 1]; |
2095 | + fread(source, source_size, 1, source_file); |
2096 | + source[source_size] = '\0'; |
2097 | + |
2098 | + fclose(source_file); |
2099 | + |
2100 | + |
2101 | + |
2102 | + grid_t grid; |
2103 | + grid.dt = ALPHA/(NX-1); |
2104 | + grid.dx = grid.dy = grid.dz = 1.0/(NX-1); |
2105 | + grid.ai = grid.aj = grid.ak = roundup(NX); |
2106 | + grid.ni = grid.nj = grid.nk = NX; |
2107 | + |
2108 | + cl_double *restrict phi = |
2109 | + malloc (grid.ai*grid.aj*grid.ak * sizeof *phi ); |
2110 | + cl_double *restrict phi_p = |
2111 | + malloc (grid.ai*grid.aj*grid.ak * sizeof *phi_p ); |
2112 | + cl_double *restrict phi_p_p = |
2113 | + malloc (grid.ai*grid.aj*grid.ak * sizeof *phi_p_p); |
2114 | + |
2115 | + // Set up initial data (TODO: do this on the device instead) |
2116 | + printf ("Initial condition: t=%g\n", 0.0); |
2117 | + double const kx = M_PI; |
2118 | + double const ky = M_PI; |
2119 | + double const kz = M_PI; |
2120 | + double const omega = sqrt(pow(kx,2)+pow(ky,2)+pow(kz,2)); |
2121 | + for (int k = 0; k < NX; ++k) { |
2122 | + for (int j = 0; j < NX; ++j) { |
2123 | + for (int i = 0; i < NX; ++i) { |
2124 | + double const t0 = 0.0; |
2125 | + double const t1 = -grid.dt; |
2126 | + double const x = i*grid.dx; |
2127 | + double const y = j*grid.dy; |
2128 | + double const z = k*grid.dz; |
2129 | + int const ind3d = i+grid.ai*(j+grid.aj*k); |
2130 | + phi [ind3d] = sin(kx*x) * sin(ky*y) * sin(kz*z) * cos(omega*t0); |
2131 | + phi_p[ind3d] = sin(kx*x) * sin(ky*y) * sin(kz*z) * cos(omega*t1); |
2132 | + } |
2133 | + } |
2134 | + } |
2135 | + |
2136 | + // Take some time steps |
2137 | + for (int n=0; n<NT; ++n) { |
2138 | + printf ("Time step %d: t=%g\n", n+1, (n+1)*grid.dt); |
2139 | + |
2140 | + // Cycle time levels |
2141 | + { |
2142 | + cl_double *tmp = phi_p_p; |
2143 | + phi_p_p = phi_p; |
2144 | + phi_p = phi; |
2145 | + phi = tmp; |
2146 | + } |
2147 | + |
2148 | + // TODO: We create the program and allocate the buffers each time, |
2149 | + // which is slow. But then, we only want to test correctness, not |
2150 | + // performance. (Yet?) |
2151 | + int const ierr = |
2152 | + exec_scalarwave_kernel (source, phi, phi_p, phi_p_p, &grid); |
2153 | + assert(!ierr); |
2154 | + |
2155 | + } // for n |
2156 | + |
2157 | + for (int i=0; i<NX; ++i) { |
2158 | + int const j = i; |
2159 | + int const k = i; |
2160 | + double const x = grid.dx*i; |
2161 | + double const y = grid.dy*j; |
2162 | + double const z = grid.dz*k; |
2163 | + int const ind3d = i+grid.ai*(j+grid.aj*k); |
2164 | + |
2165 | + printf ("phi(%-8g,%-8g,%-8g) = %g\n", x,y,z, phi[ind3d]); |
2166 | + } |
2167 | + |
2168 | + printf ("Done.\n"); |
2169 | + |
2170 | + return 0; |
2171 | +} |
2172 | |
2173 | === added file 'examples/scalarwave/scalarwave.cl' |
2174 | --- examples/scalarwave/scalarwave.cl 1970-01-01 00:00:00 +0000 |
2175 | +++ examples/scalarwave/scalarwave.cl 2011-12-14 15:59:07 +0000 |
2176 | @@ -0,0 +1,94 @@ |
2177 | +// Evolve the scalar wave equation with Dirichlet boundaries |
2178 | + |
2179 | +/* This kernel is very short. To run efficiently, probably the |
2180 | + following optimizations need to occur: |
2181 | + - Vectorization (with the device's natural vector length) |
2182 | + - Maybe: Loop unrolling with small 3D blocks |
2183 | + - Small explicit 3D loops (to amortize stencil loads, aka "loop |
2184 | + blocking") |
2185 | + - Multi-threading (aka parallelization) |
2186 | + - Hoist setup operations (mostly integer operations) out of the |
2187 | + kernel loop |
2188 | + None of these are implemented explicitly here. We could provide |
2189 | + several optimized versions, and then compare with pocl's |
2190 | + capabilities. |
2191 | + */ |
2192 | + |
2193 | +typedef struct grid_t { |
2194 | + double dt; // time step |
2195 | + double dx, dy, dz; // resolution |
2196 | + int ai, aj, ak; // allocated size |
2197 | + int ni, nj, nk; // used size |
2198 | +} grid_t; |
2199 | + |
2200 | +void |
2201 | +scalarwave(global double *restrict const phi, |
2202 | + global double const *restrict const phi_p, |
2203 | + global double const *restrict const phi_p_p, |
2204 | + constant grid_t *restrict const grid) |
2205 | +{ |
2206 | + /* printf("dt=%g\n", grid->dt); */ |
2207 | + /* printf("dxyz=[%g,%g,%g]\n", grid->dx, grid->dy, grid->dz); */ |
2208 | + /* printf("aijk=[%d,%d,%d]\n", grid->ai, grid->aj, grid->ak); */ |
2209 | + /* printf("nijk=[%d,%d,%d]\n", grid->ni, grid->nj, grid->nk); */ |
2210 | + |
2211 | + double const dt = grid->dt; |
2212 | + |
2213 | + double const dx = grid->dx; |
2214 | + double const dy = grid->dy; |
2215 | + double const dz = grid->dz; |
2216 | + |
2217 | + double const dt2 = pown(dt,2); |
2218 | + |
2219 | + double const idx2 = pown(dx,-2); |
2220 | + double const idy2 = pown(dy,-2); |
2221 | + double const idz2 = pown(dz,-2); |
2222 | + |
2223 | + size_t const ai = grid->ai; |
2224 | + size_t const aj = grid->aj; |
2225 | + size_t const ak = grid->ak; |
2226 | + |
2227 | + size_t const ni = grid->ni; |
2228 | + size_t const nj = grid->nj; |
2229 | + size_t const nk = grid->nk; |
2230 | + |
2231 | + size_t const di = 1; |
2232 | + size_t const dj = di * ai; |
2233 | + size_t const dk = dj * aj; |
2234 | + |
2235 | +#if 0 |
2236 | + printf("work_dim =%u\n", get_work_dim()); |
2237 | + printf("global_size =[%zu,%zu,%zu]\n", get_global_size(0), get_global_size(1), get_global_size(2)); |
2238 | + printf("global_id =[%zu,%zu,%zu]\n", get_global_id(0), get_global_id(1), get_global_id(2)); |
2239 | + printf("local_size =[%zu,%zu,%zu]\n", get_local_size(0), get_local_size(1), get_local_size(2)); |
2240 | + printf("local_id =[%zu,%zu,%zu]\n", get_local_id(0), get_local_id(1), get_local_id(2)); |
2241 | + printf("num_groups =[%zu,%zu,%zu]\n", get_num_groups(0), get_num_groups(1), get_num_groups(2)); |
2242 | + printf("group_id =[%zu,%zu,%zu]\n", get_group_id(0), get_group_id(1), get_group_id(2)); |
2243 | + printf("global_offset=[%zu,%zu,%zu]\n", get_global_offset(0), get_global_offset(1), get_global_offset(2)); |
2244 | +#endif |
2245 | + |
2246 | + size_t const i = get_global_id(0); |
2247 | + size_t const j = get_global_id(1); |
2248 | + size_t const k = get_global_id(2); |
2249 | + |
2250 | + // If outside the domain, do nothing |
2251 | + if (i>=ni || j>=nj || k>=nk) return; |
2252 | + |
2253 | + size_t const ind3d = di*i + dj*j + dk*k; |
2254 | + |
2255 | + if (i==0 || i==ni-1 || j==0 || j==nj-1 || k==0 || k==nk-1) { |
2256 | + // Boundary condition |
2257 | + |
2258 | + phi[ind3d] = 0.0; |
2259 | + |
2260 | + } else { |
2261 | + // Scalar wave equation |
2262 | + |
2263 | + phi[ind3d] = |
2264 | + 2.0 * phi_p[ind3d] - phi_p_p[ind3d] + |
2265 | + dt2 * ((phi_p[ind3d-di] - 2.0*phi_p[ind3d] + phi_p[ind3d+di]) * idx2 + |
2266 | + (phi_p[ind3d-dj] - 2.0*phi_p[ind3d] + phi_p[ind3d+dj]) * idy2 + |
2267 | + (phi_p[ind3d-dk] - 2.0*phi_p[ind3d] + phi_p[ind3d+dk]) * idz2); |
2268 | + |
2269 | + } |
2270 | +} |
2271 | |
2272 | === modified file 'examples/trig/trig_exec.c' |
2273 | --- examples/trig/trig_exec.c 2011-10-24 17:55:01 +0000 |
2274 | +++ examples/trig/trig_exec.c 2011-12-14 15:59:07 +0000 |
2275 | @@ -25,7 +25,7 @@ |
2276 | cl_int err; |
2277 | |
2278 | // create the OpenCL context on a GPU device |
2279 | - context = clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU, |
2280 | + context = clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU, |
2281 | NULL, NULL, NULL); |
2282 | if (context == (cl_context)0) |
2283 | return -1; |
2284 | |
2285 | === modified file 'include/_kernel.h' |
2286 | --- include/_kernel.h 2011-12-09 13:35:18 +0000 |
2287 | +++ include/_kernel.h 2011-12-14 15:59:07 +0000 |
2288 | @@ -424,14 +424,14 @@ |
2289 | |
2290 | |
2291 | /* Work-Item Functions */ |
2292 | |
2293 | -// uint get_work_dim(); |
2294 | -uint get_global_size(uint); // should return size_t |
2295 | -uint get_global_id(uint); // should return size_t |
2296 | -// size_t get_local_size(uint); |
2297 | -uint get_local_id(uint); // should return size_t |
2298 | -uint get_num_groups(uint); // should return size_t |
2299 | -uint get_group_id(uint); // should return size_t |
2300 | -// size_t get_global_offset(uint); |
2301 | +uint get_work_dim(); |
2302 | +size_t get_global_size(uint); |
2303 | +size_t get_global_id(uint); |
2304 | +size_t get_local_size(uint); |
2305 | +size_t get_local_id(uint); |
2306 | +size_t get_num_groups(uint); |
2307 | +size_t get_group_id(uint); |
2308 | +size_t get_global_offset(uint); |
2309 | |
2310 | __attribute__ ((noinline)) void barrier (cl_mem_fence_flags flags); |
2311 | |
2312 | @@ -1304,6 +1304,7 @@ |
2313 | _CL_DECLARE_FUNC_G_GG(rotate) |
2314 | _CL_DECLARE_FUNC_G_GG(sub_sat) |
2315 | _CL_DECLARE_FUNC_LG_GUG(upsample) |
2316 | +_CL_DECLARE_FUNC_G_G(popcount) |
2317 | _CL_DECLARE_FUNC_J_JJJ(mad24) |
2318 | _CL_DECLARE_FUNC_J_JJ(mul24) |
2319 | |
2320 | @@ -1613,5 +1614,5 @@ |
2321 | // shuffle2 |
2322 | |
2323 | |
2324 | |
2325 | -/* printf */ |
2326 | -// int printf(constant char * restrict format, ...); |
2327 | +int printf(const /*constant*/ char * restrict format, ...) |
2328 | + __attribute__((format(printf, 1, 2))); |
2329 | |
2330 | === modified file 'lib/CL/Makefile.am' |
2331 | --- lib/CL/Makefile.am 2011-12-07 13:02:21 +0000 |
2332 | +++ lib/CL/Makefile.am 2011-12-14 15:59:07 +0000 |
2333 | @@ -46,6 +46,7 @@ |
2334 | clCreateContext.c \ |
2335 | clGetProgramBuildInfo.c \ |
2336 | clEnqueueWriteBuffer.c \ |
2337 | + clFinish.c \ |
2338 | clGetKernelWorkGroupInfo.c \ |
2339 | pocl_cl.h \ |
2340 | devices/devices.h \ |
2341 | |
2342 | === modified file 'lib/CL/clCreateBuffer.c' |
2343 | --- lib/CL/clCreateBuffer.c 2011-10-14 10:31:27 +0000 |
2344 | +++ lib/CL/clCreateBuffer.c 2011-12-14 15:59:07 +0000 |
2345 | @@ -33,7 +33,7 @@ |
2346 | cl_mem mem; |
2347 | cl_device_id device_id; |
2348 | void *device_ptr; |
2349 | - unsigned i; |
2350 | + unsigned i, j; |
2351 | |
2352 | if (context == NULL) |
2353 | POCL_ERROR(CL_INVALID_CONTEXT); |
2354 | @@ -54,7 +54,12 @@ |
2355 | device_ptr = device_id->malloc(device_id->data, flags, size, host_ptr); |
2356 | if (device_ptr == NULL) |
2357 | { |
2358 | - clReleaseMemObject(mem); |
2359 | + for (j = 0; j < i; ++j) |
2360 | + { |
2361 | + device_id = context->devices[j]; |
2362 | + device_id->free(device_id->data, flags, mem->device_ptrs[j]); |
2363 | + } |
2364 | + free(mem); |
2365 | POCL_ERROR(CL_MEM_OBJECT_ALLOCATION_FAILURE); |
2366 | } |
2367 | mem->device_ptrs[i] = device_ptr; |
2368 | |
2369 | === modified file 'lib/CL/clCreateKernel.c' |
2370 | --- lib/CL/clCreateKernel.c 2011-12-05 11:21:04 +0000 |
2371 | +++ lib/CL/clCreateKernel.c 2011-12-14 15:59:07 +0000 |
2372 | @@ -125,6 +125,13 @@ |
2373 | sizeof (struct pocl_argument)); |
2374 | kernel->next = NULL; |
2375 | |
2376 | + /* Initialize kernel arguments (in case the user doesn't). */ |
2377 | + for (i = 0; i < kernel->num_args; ++i) |
2378 | + { |
2379 | + kernel->arguments[i].value = NULL; |
2380 | + kernel->arguments[i].size = 0; |
2381 | + } |
2382 | + |
2383 | /* Fill up automatic local arguments. */ |
2384 | for (i = 0; i < kernel->num_locals; ++i) |
2385 | { |
2386 | @@ -133,14 +140,9 @@ |
2387 | ((size_t *) lt_dlsym(dlhandle, "_local_size"))[i]; |
2388 | } |
2389 | |
2390 | - if (program->kernels == NULL) |
2391 | - program->kernels = kernel; |
2392 | - else { |
2393 | - cl_kernel k = program->kernels; |
2394 | - while (k->next != NULL) |
2395 | - k = k->next; |
2396 | - k->next = kernel; |
2397 | - } |
2398 | + cl_kernel k = program->kernels; |
2399 | + program->kernels = kernel; |
2400 | + kernel->next = k; |
2401 | |
2402 | return kernel; |
2403 | } |
2404 | |
2405 | === modified file 'lib/CL/clEnqueueNDRangeKernel.c' |
2406 | --- lib/CL/clEnqueueNDRangeKernel.c 2011-11-29 17:55:10 +0000 |
2407 | +++ lib/CL/clEnqueueNDRangeKernel.c 2011-12-14 15:59:07 +0000 |
2408 | @@ -22,6 +22,7 @@ |
2409 | */ |
2410 | |
2411 | #include "pocl_cl.h" |
2412 | +#include <assert.h> |
2413 | #include <sys/stat.h> |
2414 | #include <unistd.h> |
2415 | |
2416 | @@ -62,8 +63,10 @@ |
2417 | if (command_queue->context != kernel->context) |
2418 | return CL_INVALID_CONTEXT; |
2419 | |
2420 | - if (work_dim < 1 || work_dim > 3) |
2421 | + if (work_dim < 1 || |
2422 | + work_dim > command_queue->device->max_work_item_dimensions) |
2423 | return CL_INVALID_WORK_DIMENSION; |
2424 | + assert(command_queue->device->max_work_item_dimensions <= 3); |
2425 | |
2426 | if (global_work_offset != NULL) |
2427 | { |
2428 | @@ -82,14 +85,36 @@ |
2429 | global_y = work_dim > 1 ? global_work_size[1] : 1; |
2430 | global_z = work_dim > 2 ? global_work_size[2] : 1; |
2431 | |
2432 | + if (global_x ==0 || global_y == 0 || global_z == 0) |
2433 | + return CL_INVALID_GLOBAL_WORK_SIZE; |
2434 | + |
2435 | local_x = local_work_size[0]; |
2436 | local_y = work_dim > 1 ? local_work_size[1] : 1; |
2437 | local_z = work_dim > 2 ? local_work_size[2] : 1; |
2438 | |
2439 | + if (local_x * local_y * local_z > command_queue->device->max_work_group_size) |
2440 | + return CL_INVALID_WORK_GROUP_SIZE; |
2441 | + |
2442 | + if (local_x > command_queue->device->max_work_item_sizes[0] || |
2443 | + (work_dim > 1 && |
2444 | + local_y > command_queue->device->max_work_item_sizes[1]) || |
2445 | + (work_dim > 2 && |
2446 | + local_z > command_queue->device->max_work_item_sizes[2])) |
2447 | + return CL_INVALID_WORK_ITEM_SIZE; |
2448 | + |
2449 | + if (global_x % local_x != 0 || |
2450 | + global_y % local_y != 0 || |
2451 | + global_z % local_z != 0) |
2452 | + return CL_INVALID_WORK_GROUP_SIZE; |
2453 | + |
2454 | tmpdir = mkdtemp(template); |
2455 | if (tmpdir == NULL) |
2456 | return CL_OUT_OF_HOST_MEMORY; |
2457 | |
2458 | + if ((event_wait_list == NULL && num_events_in_wait_list > 0) || |
2459 | + (event_wait_list != NULL && num_events_in_wait_list == 0)) |
2460 | + return CL_INVALID_EVENT_WAIT_LIST; |
2461 | + |
2462 | error = snprintf(kernel_filename, POCL_FILENAME_LENGTH, |
2463 | "%s/kernel.bc", |
2464 | tmpdir); |
2465 | @@ -136,6 +161,9 @@ |
2466 | pc.num_groups[0] = global_x / local_x; |
2467 | pc.num_groups[1] = global_y / local_y; |
2468 | pc.num_groups[2] = global_z / local_z; |
2469 | + pc.global_offset[0] = offset_x; |
2470 | + pc.global_offset[1] = offset_y; |
2471 | + pc.global_offset[2] = offset_z; |
2472 | |
2473 | command_queue->device->run(command_queue->device->data, |
2474 | parallel_filename, |
2475 | |
2476 | === added file 'lib/CL/clFinish.c' |
2477 | --- lib/CL/clFinish.c 1970-01-01 00:00:00 +0000 |
2478 | +++ lib/CL/clFinish.c 2011-12-14 15:59:07 +0000 |
2479 | @@ -0,0 +1,31 @@ |
2480 | +/* OpenCL runtime library: clFinish() |
2481 | + |
2482 | + Copyright (c) 2011 Erik Schnetter |
2483 | + |
2484 | + Permission is hereby granted, free of charge, to any person obtaining a copy |
2485 | + of this software and associated documentation files (the "Software"), to deal |
2486 | + in the Software without restriction, including without limitation the rights |
2487 | + to use, copy, modify, merge, publish, distribute, sublicense, and/or sell |
2488 | + copies of the Software, and to permit persons to whom the Software is |
2489 | + furnished to do so, subject to the following conditions: |
2490 | + |
2491 | + The above copyright notice and this permission notice shall be included in |
2492 | + all copies or substantial portions of the Software. |
2493 | + |
2494 | + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR |
2495 | + IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, |
2496 | + FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE |
2497 | + AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER |
2498 | + LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, |
2499 | + OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN |
2500 | + THE SOFTWARE. |
2501 | +*/ |
2502 | + |
2503 | +#include "pocl_cl.h" |
2504 | + |
2505 | +CL_API_ENTRY cl_int CL_API_CALL |
2506 | +clFinish(cl_command_queue command_queue) |
2507 | +{ |
2508 | + /* All operations are serialised, so we never have to wait */ |
2509 | + return CL_SUCCESS; |
2510 | +} |
2511 | |
2512 | === modified file 'lib/CL/clGetDeviceIDs.c' |
2513 | --- lib/CL/clGetDeviceIDs.c 2011-12-02 13:09:02 +0000 |
2514 | +++ lib/CL/clGetDeviceIDs.c 2011-12-14 15:59:07 +0000 |
2515 | @@ -42,12 +42,11 @@ |
2516 | return CL_INVALID_PLATFORM; |
2517 | |
2518 | // Currently - POCL supports only the host device - i.e. a CPU |
2519 | - if (device_type & CL_DEVICE_TYPE_CPU || |
2520 | - device_type & CL_DEVICE_TYPE_DEFAULT || |
2521 | - device_type & CL_DEVICE_TYPE_ALL) |
2522 | + if ((device_type & CL_DEVICE_TYPE_CPU) || |
2523 | + (device_type & CL_DEVICE_TYPE_DEFAULT)) |
2524 | num = 1; |
2525 | - else if (device_type == CL_DEVICE_TYPE_GPU || |
2526 | - device_type == CL_DEVICE_TYPE_ACCELERATOR ) |
2527 | + else if ((device_type | CL_DEVICE_TYPE_GPU) || |
2528 | + (device_type | CL_DEVICE_TYPE_ACCELERATOR)) |
2529 | num = 0; |
2530 | else |
2531 | return CL_INVALID_DEVICE_TYPE; |
2532 | |
2533 | === modified file 'lib/CL/clReleaseKernel.c' |
2534 | --- lib/CL/clReleaseKernel.c 2011-10-14 10:31:27 +0000 |
2535 | +++ lib/CL/clReleaseKernel.c 2011-12-14 15:59:07 +0000 |
2536 | @@ -26,6 +26,11 @@ |
2537 | CL_API_ENTRY cl_int CL_API_CALL |
2538 | clReleaseKernel(cl_kernel kernel) CL_API_SUFFIX__VERSION_1_0 |
2539 | { |
2540 | - free(kernel); |
2541 | + /* TODO: Remove kernel from the program's linked list of kernels! */ |
2542 | + /* TODO: Deallocate kernel arguments! */ |
2543 | + /* In the mean time, we better don't free the kernel, but just make |
2544 | + it unusable... */ |
2545 | + /*free(kernel);*/ |
2546 | + memset(kernel, -1, sizeof *kernel); |
2547 | return CL_SUCCESS; |
2548 | } |
2549 | |
2550 | === modified file 'lib/CL/clReleaseMemObject.c' |
2551 | --- lib/CL/clReleaseMemObject.c 2011-10-14 10:31:27 +0000 |
2552 | +++ lib/CL/clReleaseMemObject.c 2011-12-14 15:59:07 +0000 |
2553 | @@ -32,7 +32,7 @@ |
2554 | for (i = 0; i < memobj->context->num_devices; ++i) |
2555 | { |
2556 | device_id = memobj->context->devices[i]; |
2557 | - device_id->free(device_id->data, memobj->device_ptrs[i]); |
2558 | + device_id->free(device_id->data, memobj->flags, memobj->device_ptrs[i]); |
2559 | } |
2560 | |
2561 | free(memobj->device_ptrs); |
2562 | |
2563 | === modified file 'lib/CL/clSetKernelArg.c' |
2564 | --- lib/CL/clSetKernelArg.c 2011-12-05 10:43:51 +0000 |
2565 | +++ lib/CL/clSetKernelArg.c 2011-12-14 15:59:07 +0000 |
2566 | @@ -45,9 +45,10 @@ |
2567 | |
2568 | p = &(kernel->arguments[arg_index]); |
2569 | |
2570 | - p->value = NULL; |
2571 | if (arg_value != NULL) |
2572 | { |
2573 | + free (p->value); |
2574 | + |
2575 | value = malloc (arg_size); |
2576 | if (value == NULL) |
2577 | return CL_OUT_OF_HOST_MEMORY; |
2578 | @@ -56,6 +57,11 @@ |
2579 | |
2580 | p->value = value; |
2581 | } |
2582 | + else |
2583 | + { |
2584 | + free (p->value); |
2585 | + p->value = NULL; |
2586 | + } |
2587 | |
2588 | p->size = arg_size; |
2589 | |
2590 | |
2591 | === modified file 'lib/CL/devices/native/native.c' |
2592 | --- lib/CL/devices/native/native.c 2011-12-05 10:43:51 +0000 |
2593 | +++ lib/CL/devices/native/native.c 2011-12-14 15:59:07 +0000 |
2594 | @@ -30,15 +30,7 @@ |
2595 | #define COMMAND_LENGTH 256 |
2596 | #define WORKGROUP_STRING_LENGTH 128 |
2597 | |
2598 | -struct pointer_list { |
2599 | - void *pointer; |
2600 | - struct pointer_list *next; |
2601 | -}; |
2602 | - |
2603 | struct data { |
2604 | - /* Buffers where host pointer is used, and thus |
2605 | - should not be deallocated on free. */ |
2606 | - struct pointer_list *host_buffers; |
2607 | /* Currently loaded kernel. */ |
2608 | cl_kernel current_kernel; |
2609 | /* Loaded kernel dynamic library handle. */ |
2610 | @@ -54,7 +46,6 @@ |
2611 | |
2612 | d = (struct data *) malloc (sizeof (struct data)); |
2613 | |
2614 | - d->host_buffers = NULL; |
2615 | d->current_kernel = NULL; |
2616 | d->current_dlhandle = 0; |
2617 | |
2618 | @@ -65,11 +56,7 @@ |
2619 | pocl_native_malloc (void *data, cl_mem_flags flags, |
2620 | size_t size, void *host_ptr) |
2621 | { |
2622 | - struct data *d; |
2623 | void *b; |
2624 | - struct pointer_list *p; |
2625 | - |
2626 | - d = (struct data *) data; |
2627 | |
2628 | if (flags & CL_MEM_COPY_HOST_PTR) |
2629 | { |
2630 | @@ -84,19 +71,6 @@ |
2631 | |
2632 | if (host_ptr != NULL) |
2633 | { |
2634 | - if (d->host_buffers == NULL) |
2635 | - d->host_buffers = malloc (sizeof (struct pointer_list)); |
2636 | - |
2637 | - p = d->host_buffers; |
2638 | - while (p->next != NULL) |
2639 | - p = p->next; |
2640 | - |
2641 | - p->next = malloc (sizeof (struct pointer_list)); |
2642 | - p = p->next; |
2643 | - |
2644 | - p->pointer = host_ptr; |
2645 | - p->next = NULL; |
2646 | - |
2647 | return host_ptr; |
2648 | } |
2649 | |
2650 | @@ -107,27 +81,16 @@ |
2651 | } |
2652 | |
2653 | void |
2654 | -pocl_native_free (void *data, void *ptr) |
2655 | +pocl_native_free (void *data, cl_mem_flags flags, void *ptr) |
2656 | { |
2657 | - struct data *d; |
2658 | - struct pointer_list *p; |
2659 | - |
2660 | - d = (struct data *) data; |
2661 | - |
2662 | - p = d->host_buffers; |
2663 | - while (p != NULL) |
2664 | - { |
2665 | - if (p->pointer == ptr) |
2666 | - return; |
2667 | - |
2668 | - p = p->next; |
2669 | - } |
2670 | + if (flags & CL_MEM_COPY_HOST_PTR) |
2671 | + return; |
2672 | |
2673 | free (ptr); |
2674 | } |
2675 | |
2676 | void |
2677 | -pocl_native_read (void *data, void *host_ptr, void *device_ptr, size_t cb) |
2678 | +pocl_native_read (void *data, void *host_ptr, const void *device_ptr, size_t cb) |
2679 | { |
2680 | if (host_ptr == device_ptr) |
2681 | return; |
2682 | @@ -284,12 +247,12 @@ |
2683 | for (i = 0; i < kernel->num_args; ++i) |
2684 | { |
2685 | if (kernel->arg_is_local[i]) |
2686 | - pocl_native_free(data, *(void **)(arguments[i])); |
2687 | + pocl_native_free(data, 0, *(void **)(arguments[i])); |
2688 | } |
2689 | for (i = kernel->num_args; |
2690 | i < kernel->num_args + kernel->num_locals; |
2691 | ++i) |
2692 | - pocl_native_free(data, *(void **)(arguments[i])); |
2693 | + pocl_native_free(data, 0, *(void **)(arguments[i])); |
2694 | } |
2695 | } |
2696 | } |
2697 | |
2698 | === modified file 'lib/CL/devices/native/native.h' |
2699 | --- lib/CL/devices/native/native.h 2011-12-05 10:43:51 +0000 |
2700 | +++ lib/CL/devices/native/native.h 2011-12-14 15:59:07 +0000 |
2701 | @@ -29,8 +29,8 @@ |
2702 | void pocl_native_init (cl_device_id device); |
2703 | void *pocl_native_malloc (void *data, cl_mem_flags flags, |
2704 | size_t size, void *host_ptr); |
2705 | -void pocl_native_free (void *data, void *ptr); |
2706 | -void pocl_native_read (void *data, void *host_ptr, void *device_ptr, size_t cb); |
2707 | +void pocl_native_free (void *data, cl_mem_flags flags, void *ptr); |
2708 | +void pocl_native_read (void *data, void *host_ptr, const void *device_ptr, size_t cb); |
2709 | void pocl_native_write (void *data, const void *host_ptr, void *device_ptr, size_t cb); |
2710 | void pocl_native_run (void *data, const char *bytecode, |
2711 | cl_kernel kernel, |
2712 | @@ -39,7 +39,7 @@ |
2713 | extern size_t pocl_native_max_work_item_sizes[]; |
2714 | |
2715 | #define POCL_DEVICES_NATIVE { \ |
2716 | - CL_DEVICE_TYPE_GPU, /* type */ \ |
2717 | + CL_DEVICE_TYPE_CPU, /* type */ \ |
2718 | 0, /* vendor_id */ \ |
2719 | 0, /* max_compute_units */ \ |
2720 | 1, /* max_work_item_dimensions */ \ |
2721 | |
2722 | === modified file 'lib/CL/devices/pthread/pthread.c' |
2723 | --- lib/CL/devices/pthread/pthread.c 2011-12-05 10:43:51 +0000 |
2724 | +++ lib/CL/devices/pthread/pthread.c 2011-12-14 15:59:07 +0000 |
2725 | @@ -37,11 +37,6 @@ |
2726 | for the thread execution. */ |
2727 | #define THREAD_COUNT_ENV "POCL_MAX_PTHREAD_COUNT" |
2728 | |
2729 | -struct pointer_list { |
2730 | - void *pointer; |
2731 | - struct pointer_list *next; |
2732 | -}; |
2733 | - |
2734 | struct thread_arguments { |
2735 | void *data; |
2736 | cl_kernel kernel; |
2737 | @@ -52,9 +47,6 @@ |
2738 | }; |
2739 | |
2740 | struct data { |
2741 | - /* Buffers where host pointer is used, and thus |
2742 | - should not be deallocated on free. */ |
2743 | - struct pointer_list *host_buffers; |
2744 | /* Currently loaded kernel. */ |
2745 | cl_kernel current_kernel; |
2746 | /* Loaded kernel dynamic library handle. */ |
2747 | @@ -63,7 +55,10 @@ |
2748 | |
2749 | static void * workgroup_thread (void *p); |
2750 | |
2751 | -size_t pocl_pthread_max_work_item_sizes[] = {1}; |
2752 | +/* This could be SIZE_T_MAX, but setting it to INT_MAX should suffice, |
2753 | + and may avoid errors in user code that uses int instead of |
2754 | + size_t */ |
2755 | +size_t pocl_pthread_max_work_item_sizes[] = {CL_INT_MAX,CL_INT_MAX,CL_INT_MAX}; |
2756 | |
2757 | void |
2758 | pocl_pthread_init (cl_device_id device) |
2759 | @@ -72,7 +67,6 @@ |
2760 | |
2761 | d = (struct data *) malloc (sizeof (struct data)); |
2762 | |
2763 | - d->host_buffers = NULL; |
2764 | d->current_kernel = NULL; |
2765 | d->current_dlhandle = 0; |
2766 | |
2767 | @@ -80,14 +74,9 @@ |
2768 | } |
2769 | |
2770 | void * |
2771 | -pocl_pthread_malloc (void *data, cl_mem_flags flags, |
2772 | - size_t size, void *host_ptr) |
2773 | +pocl_pthread_malloc (void *data, cl_mem_flags flags, size_t size, void *host_ptr) |
2774 | { |
2775 | - struct data *d; |
2776 | void *b; |
2777 | - struct pointer_list *p; |
2778 | - |
2779 | - d = (struct data *) data; |
2780 | |
2781 | if (flags & CL_MEM_COPY_HOST_PTR) |
2782 | { |
2783 | @@ -102,19 +91,6 @@ |
2784 | |
2785 | if (host_ptr != NULL) |
2786 | { |
2787 | - if (d->host_buffers == NULL) |
2788 | - d->host_buffers = malloc (sizeof (struct pointer_list)); |
2789 | - |
2790 | - p = d->host_buffers; |
2791 | - while (p->next != NULL) |
2792 | - p = p->next; |
2793 | - |
2794 | - p->next = malloc (sizeof (struct pointer_list)); |
2795 | - p = p->next; |
2796 | - |
2797 | - p->pointer = host_ptr; |
2798 | - p->next = NULL; |
2799 | - |
2800 | return host_ptr; |
2801 | } |
2802 | |
2803 | @@ -125,27 +101,16 @@ |
2804 | } |
2805 | |
2806 | void |
2807 | -pocl_pthread_free (void *data, void *ptr) |
2808 | +pocl_pthread_free (void *data, cl_mem_flags flags, void *ptr) |
2809 | { |
2810 | - struct data *d; |
2811 | - struct pointer_list *p; |
2812 | - |
2813 | - d = (struct data *) data; |
2814 | - |
2815 | - p = d->host_buffers; |
2816 | - while (p != NULL) |
2817 | - { |
2818 | - if (p->pointer == ptr) |
2819 | - return; |
2820 | - |
2821 | - p = p->next; |
2822 | - } |
2823 | + if (flags & CL_MEM_COPY_HOST_PTR) |
2824 | + return; |
2825 | |
2826 | free (ptr); |
2827 | } |
2828 | |
2829 | void |
2830 | -pocl_pthread_read (void *data, void *host_ptr, void *device_ptr, size_t cb) |
2831 | +pocl_pthread_read (void *data, void *host_ptr, const void *device_ptr, size_t cb) |
2832 | { |
2833 | if (host_ptr == device_ptr) |
2834 | return; |
2835 | @@ -188,7 +153,7 @@ |
2836 | if (access (cpuinfo, R_OK) == 0) |
2837 | { |
2838 | FILE *f = fopen (cpuinfo, "r"); |
2839 | -# define MAX_CPUINFO_SIZE 16*1024 |
2840 | +# define MAX_CPUINFO_SIZE 64*1024 |
2841 | char contents[MAX_CPUINFO_SIZE]; |
2842 | int num_read = fread (contents, 1, MAX_CPUINFO_SIZE - 1, f); |
2843 | fclose (f); |
2844 | @@ -373,6 +338,8 @@ |
2845 | /* In case the work group count is not divisible by the |
2846 | number of threads, we have to execute the remaining |
2847 | workgroups in one of the threads. */ |
2848 | + /* TODO: This is inefficient; it is better to round up when |
2849 | + calculating wgs_per_thread */ |
2850 | int leftover_wgs = num_groups_x - (num_threads*wgs_per_thread); |
2851 | |
2852 | #ifdef DEBUG_MT |
2853 | @@ -392,25 +359,25 @@ |
2854 | first_gid_x, last_gid_x); |
2855 | #endif |
2856 | |
2857 | - pc->group_id[0] = first_gid_x; |
2858 | - |
2859 | arguments[i].data = data; |
2860 | arguments[i].kernel = kernel; |
2861 | arguments[i].device = device; |
2862 | arguments[i].pc = *pc; |
2863 | + arguments[i].pc.group_id[0] = first_gid_x; |
2864 | arguments[i].workgroup = w; |
2865 | arguments[i].last_gid_x = last_gid_x; |
2866 | |
2867 | - pthread_create (&threads[i], |
2868 | - NULL, |
2869 | - workgroup_thread, |
2870 | - &arguments[i]); |
2871 | + error = pthread_create (&threads[i], |
2872 | + NULL, |
2873 | + workgroup_thread, |
2874 | + &arguments[i]); |
2875 | + assert(!error); |
2876 | } |
2877 | |
2878 | for (i = 0; i < num_threads; ++i) { |
2879 | pthread_join(threads[i], NULL); |
2880 | #ifdef DEBUG_MT |
2881 | - printf("### thread %x finished\n", (unsigned)threads[i]); |
2882 | + printf("### thread %u finished\n", (unsigned)threads[i]); |
2883 | #endif |
2884 | } |
2885 | |
2886 | @@ -468,12 +435,18 @@ |
2887 | for (i = 0; i < kernel->num_args; ++i) |
2888 | { |
2889 | if (kernel->arg_is_local[i]) |
2890 | - pocl_native_free(ta->data, *(void **)(arguments[i])); |
2891 | + { |
2892 | + pocl_pthread_free(ta->data, 0, *(void **)(arguments[i])); |
2893 | + free(arguments[i]); |
2894 | + } |
2895 | } |
2896 | for (i = kernel->num_args; |
2897 | i < kernel->num_args + kernel->num_locals; |
2898 | ++i) |
2899 | - pocl_native_free(ta->data, *(void **)(arguments[i])); |
2900 | + { |
2901 | + pocl_pthread_free(ta->data, 0, *(void **)(arguments[i])); |
2902 | + free(arguments[i]); |
2903 | + } |
2904 | |
2905 | return NULL; |
2906 | } |
2907 | |
2908 | === modified file 'lib/CL/devices/pthread/pthread.h' |
2909 | --- lib/CL/devices/pthread/pthread.h 2011-12-05 10:43:51 +0000 |
2910 | +++ lib/CL/devices/pthread/pthread.h 2011-12-14 15:59:07 +0000 |
2911 | @@ -29,8 +29,8 @@ |
2912 | void pocl_pthread_init (cl_device_id device); |
2913 | void *pocl_pthread_malloc (void *data, cl_mem_flags flags, |
2914 | size_t size, void *host_ptr); |
2915 | -void pocl_pthread_free (void *data, void *ptr); |
2916 | -void pocl_pthread_read (void *data, void *host_ptr, void *device_ptr, size_t cb); |
2917 | +void pocl_pthread_free (void *data, cl_mem_flags flags, void *ptr); |
2918 | +void pocl_pthread_read (void *data, void *host_ptr, const void *device_ptr, size_t cb); |
2919 | void pocl_pthread_write (void *data, const void *host_ptr, void *device_ptr, size_t cb); |
2920 | void pocl_pthread_run (void *data, const char *bytecode, |
2921 | cl_kernel kernel, |
2922 | @@ -39,14 +39,14 @@ |
2923 | extern size_t pocl_pthread_max_work_item_sizes[]; |
2924 | |
2925 | #define POCL_DEVICES_PTHREAD { \ |
2926 | - CL_DEVICE_TYPE_GPU, /* type */ \ |
2927 | + CL_DEVICE_TYPE_CPU, /* type */ \ |
2928 | 0, /* vendor_id */ \ |
2929 | 0, /* max_compute_units */ \ |
2930 | - 1, /* max_work_item_dimensions */ \ |
2931 | + 3, /* max_work_item_dimensions */ \ |
2932 | pocl_pthread_max_work_item_sizes, /* max_work_item_sizes */ \ |
2933 | - 1, /*max_work_group_size */ \ |
2934 | + CL_INT_MAX, /* max_work_group_size */ \ |
2935 | 0, /* preferred_vector_width_char */ \ |
2936 | - 0, /* preferred_vector_width_shortr */ \ |
2937 | + 0, /* preferred_vector_width_short */ \ |
2938 | 0, /* preferred_vector_width_int */ \ |
2939 | 0, /* preferred_vector_width_long */ \ |
2940 | 0, /* preferred_vector_width_float */ \ |
2941 | @@ -57,7 +57,7 @@ |
2942 | CL_FALSE, /* image_support */ \ |
2943 | 0, /* max_read_image_args */ \ |
2944 | 0, /* max_write_image_args */ \ |
2945 | - 0, /*image2d_max_width */ \ |
2946 | + 0, /* image2d_max_width */ \ |
2947 | 0, /* image2d_max_height */ \ |
2948 | 0, /* image3d_max_width */ \ |
2949 | 0, /* image3d_max_height */ \ |
2950 | |
2951 | === modified file 'lib/CL/pocl_cl.h' |
2952 | --- lib/CL/pocl_cl.h 2011-12-05 10:43:51 +0000 |
2953 | +++ lib/CL/pocl_cl.h 2011-12-14 15:59:07 +0000 |
2954 | @@ -98,7 +98,7 @@ |
2955 | void (*init) (cl_device_id device); |
2956 | void *(*malloc) (void *data, cl_mem_flags flags, |
2957 | size_t size, void *host_ptr); |
2958 | - void (*free) (void *data, void *ptr); |
2959 | + void (*free) (void *data, cl_mem_flags flags, void *ptr); |
2960 | void (*read) (void *data, void *host_ptr, void *device_ptr, size_t cb); |
2961 | void (*write) (void *data, const void *host_ptr, void *device_ptr, size_t cb); |
2962 | void (*run) (void *data, const char *bytecode, |
2963 | @@ -164,8 +164,8 @@ |
2964 | cl_program program; |
2965 | /* implementation */ |
2966 | lt_dlhandle dlhandle; |
2967 | - int *arg_is_pointer; |
2968 | - int *arg_is_local; |
2969 | + cl_int *arg_is_pointer; |
2970 | + cl_int *arg_is_local; |
2971 | cl_uint num_locals; |
2972 | struct pocl_argument *arguments; |
2973 | struct _cl_kernel *next; |
2974 | |
2975 | === modified file 'lib/kernel/Makefile.am' |
2976 | --- lib/kernel/Makefile.am 2011-12-01 16:44:06 +0000 |
2977 | +++ lib/kernel/Makefile.am 2011-12-14 15:59:07 +0000 |
2978 | @@ -37,10 +37,12 @@ |
2979 | $(CLANG) $(AM_CPPFLAGS) $(CLANGFLAGS) -c -emit-llvm -I$(top_builddir) -include $(top_srcdir)/include/types.h -include $(top_srcdir)/include/_kernel.h -o $@ $< |
2980 | |
2981 | .c.o: |
2982 | - $(CLANG) $(AM_CPPFLAGS) $(CLANGFLAGS) -c -emit-llvm -I$(top_builddir) -o $@ $< |
2983 | + $(CLANG) $(AM_CPPFLAGS) $(CLANGFLAGS) -c -emit-llvm -I$(top_builddir) -include $(top_srcdir)/include/types.h -o $@ $< |
2984 | |
2985 | .ll.o: |
2986 | $(LLVM_AS) -o $@ $< |
2987 | |
2988 | -$(libkernel_a_SOURCES:.c=.o): $(top_srcdir)/include/_kernel.h |
2989 | -$(libkernel_a_SOURCES:.cl=.o): $(top_srcdir)/include/_kernel.h templates.h |
2990 | +# TODO: This misses the dependency on the device-specific include files |
2991 | +# TODO: Dependencies should be generated automatically |
2992 | +$(patsubst %.c, %.o, $(filter %.c, $(libkernel_a_SOURCES))): $(top_srcdir)/include/types.h |
2993 | +$(patsubst %.cl, %.o, $(filter %.cl, $(libkernel_a_SOURCES))): $(top_srcdir)/include/types.h $(top_srcdir)/include/_kernel.h templates.h |
2994 | |
2995 | === modified file 'lib/kernel/arm/Makefile.am' |
2996 | --- lib/kernel/arm/Makefile.am 2011-12-01 16:44:06 +0000 |
2997 | +++ lib/kernel/arm/Makefile.am 2011-12-14 15:59:07 +0000 |
2998 | @@ -58,7 +58,9 @@ |
2999 | endif |
3000 | |
3001 | .c.o: |
3002 | - $(CLANG) $(AM_CPPFLAGS) $(CLANGFLAGS) -c -emit-llvm -o $@ $< |
3003 | + $(CLANG) $(AM_CPPFLAGS) $(CLANGFLAGS) -c -emit-llvm -include $(top_srcdir)/include/arm/types.h -o $@ $< |
3004 | |
3005 | -$(libkernel_a_SOURCES:.c=.o): $(top_srcdir)/include/_kernel.h |
3006 | -$(libkernel_a_SOURCES:.cl=.o): $(top_srcdir)/include/_kernel.h ../templates.h |
3007 | +# TODO: This misses the dependency on the device-specific include files |
3008 | +# TODO: Dependencies should be generated automatically |
3009 | +$(patsubst %.c, %.o, $(filter %.c, $(libkernel_a_SOURCES))): $(top_srcdir)/include/arm/types.h |
3010 | +$(patsubst %.cl, %.o, $(filter %.cl, $(libkernel_a_SOURCES))): $(top_srcdir)/include/types.h $(top_srcdir)/include/arm/types.h $(top_srcdir)/include/_kernel.h ../templates.h |
3011 | |
3012 | === modified file 'lib/kernel/clz.cl' |
3013 | --- lib/kernel/clz.cl 2011-10-27 00:18:42 +0000 |
3014 | +++ lib/kernel/clz.cl 2011-12-14 15:59:07 +0000 |
3015 | @@ -23,7 +23,7 @@ |
3016 | |
3017 | #include "templates.h" |
3018 | |
3019 | -// Intel: LZCNT (and POPCNT) |
3020 | +// Intel: LZCNT |
3021 | |
3022 | #define __builtin_clzhh __builtin_clz |
3023 | #define __builtin_clzh __builtin_clz |
3024 | |
3025 | === modified file 'lib/kernel/get_global_id.c' |
3026 | --- lib/kernel/get_global_id.c 2011-12-01 16:44:06 +0000 |
3027 | +++ lib/kernel/get_global_id.c 2011-12-14 15:59:07 +0000 |
3028 | @@ -33,14 +33,18 @@ |
3029 | extern unsigned int _local_id_y; |
3030 | extern unsigned int _local_id_z; |
3031 | |
3032 | -unsigned int |
3033 | +extern unsigned int _global_offset_x; |
3034 | +extern unsigned int _global_offset_y; |
3035 | +extern unsigned int _global_offset_z; |
3036 | + |
3037 | +size_t |
3038 | get_global_id(unsigned int dimindx) |
3039 | { |
3040 | switch(dimindx) |
3041 | { |
3042 | - case 0: return _local_size_x * _group_id_x + _local_id_x; |
3043 | - case 1: return _local_size_y * _group_id_y + _local_id_y; |
3044 | - case 2: return _local_size_z * _group_id_z + _local_id_z; |
3045 | + case 0: return _global_offset_x + _local_size_x * _group_id_x + _local_id_x; |
3046 | + case 1: return _global_offset_y + _local_size_y * _group_id_y + _local_id_y; |
3047 | + case 2: return _global_offset_z + _local_size_z * _group_id_z + _local_id_z; |
3048 | default: return 0; |
3049 | } |
3050 | } |
3051 | |
3052 | === added file 'lib/kernel/get_global_offset.c' |
3053 | --- lib/kernel/get_global_offset.c 1970-01-01 00:00:00 +0000 |
3054 | +++ lib/kernel/get_global_offset.c 2011-12-14 15:59:07 +0000 |
3055 | @@ -0,0 +1,39 @@ |
3056 | +/* OpenCL built-in library: get_global_offset() |
3057 | + |
3058 | + Copyright (c) 2011 Universidad Rey Juan Carlos |
3059 | + |
3060 | + Permission is hereby granted, free of charge, to any person obtaining a copy |
3061 | + of this software and associated documentation files (the "Software"), to deal |
3062 | + in the Software without restriction, including without limitation the rights |
3063 | + to use, copy, modify, merge, publish, distribute, sublicense, and/or sell |
3064 | + copies of the Software, and to permit persons to whom the Software is |
3065 | + furnished to do so, subject to the following conditions: |
3066 | + |
3067 | + The above copyright notice and this permission notice shall be included in |
3068 | + all copies or substantial portions of the Software. |
3069 | + |
3070 | + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR |
3071 | + IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, |
3072 | + FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE |
3073 | + AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER |
3074 | + LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, |
3075 | + OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN |
3076 | + THE SOFTWARE. |
3077 | +*/ |
3078 | + |
3079 | +extern unsigned int _global_offset_x; |
3080 | +extern unsigned int _global_offset_y; |
3081 | +extern unsigned int _global_offset_z; |
3082 | + |
3083 | +size_t |
3084 | +get_global_offset(unsigned int dimindx) |
3085 | +{ |
3086 | + switch(dimindx) |
3087 | + { |
3088 | + case 0: return _global_offset_x; |
3089 | + case 1: return _global_offset_y; |
3090 | + case 2: return _global_offset_z; |
3091 | + default: return 0; |
3092 | + } |
3093 | +} |
3094 | + |
3095 | |
3096 | === modified file 'lib/kernel/get_global_size.c' |
3097 | --- lib/kernel/get_global_size.c 2011-12-01 16:44:06 +0000 |
3098 | +++ lib/kernel/get_global_size.c 2011-12-14 15:59:07 +0000 |
3099 | @@ -29,7 +29,7 @@ |
3100 | extern unsigned int _num_groups_y; |
3101 | extern unsigned int _num_groups_z; |
3102 | |
3103 | -unsigned int |
3104 | +size_t |
3105 | get_global_size(unsigned int dimindx) |
3106 | { |
3107 | switch(dimindx) |
3108 | |
3109 | === modified file 'lib/kernel/get_group_id.c' |
3110 | --- lib/kernel/get_group_id.c 2011-12-01 16:44:06 +0000 |
3111 | +++ lib/kernel/get_group_id.c 2011-12-14 15:59:07 +0000 |
3112 | @@ -25,7 +25,7 @@ |
3113 | extern unsigned int _group_id_y; |
3114 | extern unsigned int _group_id_z; |
3115 | |
3116 | -unsigned int |
3117 | +size_t |
3118 | get_group_id(unsigned int dimindx) |
3119 | { |
3120 | switch(dimindx) |
3121 | |
3122 | === modified file 'lib/kernel/get_local_id.c' |
3123 | --- lib/kernel/get_local_id.c 2011-12-01 16:44:06 +0000 |
3124 | +++ lib/kernel/get_local_id.c 2011-12-14 15:59:07 +0000 |
3125 | @@ -25,14 +25,14 @@ |
3126 | extern unsigned int _local_id_y; |
3127 | extern unsigned int _local_id_z; |
3128 | |
3129 | -unsigned int |
3130 | +size_t |
3131 | get_local_id(unsigned int dimindx) |
3132 | { |
3133 | switch(dimindx) |
3134 | { |
3135 | - case 0: return _local_id_x; |
3136 | - case 1: return _local_id_y; |
3137 | - case 2: return _local_id_z; |
3138 | + case 0: return _local_id_x; |
3139 | + case 1: return _local_id_y; |
3140 | + case 2: return _local_id_z; |
3141 | default: return 0; |
3142 | } |
3143 | } |
3144 | |
3145 | === added file 'lib/kernel/get_local_size.c' |
3146 | --- lib/kernel/get_local_size.c 1970-01-01 00:00:00 +0000 |
3147 | +++ lib/kernel/get_local_size.c 2011-12-14 15:59:07 +0000 |
3148 | @@ -0,0 +1,39 @@ |
3149 | +/* OpenCL built-in library: get_local_size() |
3150 | + |
3151 | + Copyright (c) 2011 Universidad Rey Juan Carlos |
3152 | + |
3153 | + Permission is hereby granted, free of charge, to any person obtaining a copy |
3154 | + of this software and associated documentation files (the "Software"), to deal |
3155 | + in the Software without restriction, including without limitation the rights |
3156 | + to use, copy, modify, merge, publish, distribute, sublicense, and/or sell |
3157 | + copies of the Software, and to permit persons to whom the Software is |
3158 | + furnished to do so, subject to the following conditions: |
3159 | + |
3160 | + The above copyright notice and this permission notice shall be included in |
3161 | + all copies or substantial portions of the Software. |
3162 | + |
3163 | + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR |
3164 | + IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, |
3165 | + FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE |
3166 | + AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER |
3167 | + LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, |
3168 | + OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN |
3169 | + THE SOFTWARE. |
3170 | +*/ |
3171 | + |
3172 | +extern unsigned int _local_size_x; |
3173 | +extern unsigned int _local_size_y; |
3174 | +extern unsigned int _local_size_z; |
3175 | + |
3176 | +size_t |
3177 | +get_local_size(unsigned int dimindx) |
3178 | +{ |
3179 | + switch(dimindx) |
3180 | + { |
3181 | + case 0: return _local_size_x; |
3182 | + case 1: return _local_size_y; |
3183 | + case 2: return _local_size_z; |
3184 | + default: return 0; |
3185 | + } |
3186 | +} |
3187 | + |
3188 | |
3189 | === modified file 'lib/kernel/get_num_groups.c' |
3190 | --- lib/kernel/get_num_groups.c 2011-12-01 16:44:06 +0000 |
3191 | +++ lib/kernel/get_num_groups.c 2011-12-14 15:59:07 +0000 |
3192 | @@ -25,14 +25,14 @@ |
3193 | extern unsigned int _num_groups_y; |
3194 | extern unsigned int _num_groups_z; |
3195 | |
3196 | -unsigned int |
3197 | +size_t |
3198 | get_num_groups(unsigned int dimindx) |
3199 | { |
3200 | switch(dimindx) |
3201 | { |
3202 | - case 0: return _num_groups_x; |
3203 | - case 1: return _num_groups_y; |
3204 | - case 2: return _num_groups_z; |
3205 | + case 0: return _num_groups_x; |
3206 | + case 1: return _num_groups_y; |
3207 | + case 2: return _num_groups_z; |
3208 | default: return 0; |
3209 | } |
3210 | } |
3211 | |
3212 | === added file 'lib/kernel/get_work_dim.c' |
3213 | --- lib/kernel/get_work_dim.c 1970-01-01 00:00:00 +0000 |
3214 | +++ lib/kernel/get_work_dim.c 2011-12-14 15:59:07 +0000 |
3215 | @@ -0,0 +1,31 @@ |
3216 | +/* OpenCL built-in library: get_work_dim() |
3217 | + |
3218 | + Copyright (c) 2011 Universidad Rey Juan Carlos |
3219 | + |
3220 | + Permission is hereby granted, free of charge, to any person obtaining a copy |
3221 | + of this software and associated documentation files (the "Software"), to deal |
3222 | + in the Software without restriction, including without limitation the rights |
3223 | + to use, copy, modify, merge, publish, distribute, sublicense, and/or sell |
3224 | + copies of the Software, and to permit persons to whom the Software is |
3225 | + furnished to do so, subject to the following conditions: |
3226 | + |
3227 | + The above copyright notice and this permission notice shall be included in |
3228 | + all copies or substantial portions of the Software. |
3229 | + |
3230 | + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR |
3231 | + IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, |
3232 | + FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE |
3233 | + AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER |
3234 | + LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, |
3235 | + OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN |
3236 | + THE SOFTWARE. |
3237 | +*/ |
3238 | + |
3239 | +extern unsigned int _work_dim; |
3240 | + |
3241 | +unsigned int |
3242 | +get_work_dim() |
3243 | +{ |
3244 | + return _work_dim; |
3245 | +} |
3246 | + |
3247 | |
3248 | === added file 'lib/kernel/popcount.cl' |
3249 | --- lib/kernel/popcount.cl 1970-01-01 00:00:00 +0000 |
3250 | +++ lib/kernel/popcount.cl 2011-12-14 15:59:07 +0000 |
3251 | @@ -0,0 +1,35 @@ |
3252 | +/* OpenCL built-in library: popcount() |
3253 | + |
3254 | + Copyright (c) 2011 Universidad Rey Juan Carlos |
3255 | + |
3256 | + Permission is hereby granted, free of charge, to any person obtaining a copy |
3257 | + of this software and associated documentation files (the "Software"), to deal |
3258 | + in the Software without restriction, including without limitation the rights |
3259 | + to use, copy, modify, merge, publish, distribute, sublicense, and/or sell |
3260 | + copies of the Software, and to permit persons to whom the Software is |
3261 | + furnished to do so, subject to the following conditions: |
3262 | + |
3263 | + The above copyright notice and this permission notice shall be included in |
3264 | + all copies or substantial portions of the Software. |
3265 | + |
3266 | + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR |
3267 | + IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, |
3268 | + FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE |
3269 | + AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER |
3270 | + LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, |
3271 | + OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN |
3272 | + THE SOFTWARE. |
3273 | +*/ |
3274 | + |
3275 | +#include "templates.h" |
3276 | + |
3277 | +// Intel: POPCNT |
3278 | + |
3279 | +#define __builtin_popcounthh __builtin_popcount |
3280 | +#define __builtin_popcounth __builtin_popcount |
3281 | +#define __builtin_popcountuhh __builtin_popcount |
3282 | +#define __builtin_popcountuh __builtin_popcount |
3283 | +#define __builtin_popcountu __builtin_popcount |
3284 | +#define __builtin_popcountul __builtin_popcountl |
3285 | + |
3286 | +DEFINE_BUILTIN_G_G(popcount) |
3287 | |
3288 | === modified file 'lib/kernel/signbit.cl' |
3289 | --- lib/kernel/signbit.cl 2011-11-05 00:10:25 +0000 |
3290 | +++ lib/kernel/signbit.cl 2011-12-14 15:59:07 +0000 |
3291 | @@ -21,6 +21,37 @@ |
3292 | THE SOFTWARE. |
3293 | */ |
3294 | |
3295 | -#include "templates.h" |
3296 | - |
3297 | -DEFINE_BUILTIN_K_V(signbit) |
3298 | +#define IMPLEMENT_SIGNBIT_BUILTIN_FLOAT __builtin_signbitf(a) |
3299 | +#define IMPLEMENT_SIGNBIT_BUILTIN_DOUBLE __builtin_signbit(a) |
3300 | +#define IMPLEMENT_SIGNBIT_DIRECT \ |
3301 | + ({ \ |
3302 | + int bits = CHAR_BIT * sizeof(stype); \ |
3303 | + *(jtype*)&a >> (jtype)(bits-1); \ |
3304 | + }) |
3305 | + |
3306 | +#define IMPLEMENT_DIRECT(NAME, VTYPE, STYPE, JTYPE, EXPR) \ |
3307 | + JTYPE _cl_overloadable NAME(VTYPE a) \ |
3308 | + { \ |
3309 | + typedef VTYPE vtype; \ |
3310 | + typedef STYPE stype; \ |
3311 | + typedef JTYPE jtype; \ |
3312 | + return EXPR; \ |
3313 | + } |
3314 | + |
3315 | + |
3316 | + |
3317 | +IMPLEMENT_DIRECT(signbit, float , float, int , IMPLEMENT_SIGNBIT_BUILTIN_FLOAT) |
3318 | +IMPLEMENT_DIRECT(signbit, float2 , float, int2 , IMPLEMENT_SIGNBIT_DIRECT) |
3319 | +IMPLEMENT_DIRECT(signbit, float3 , float, int3 , IMPLEMENT_SIGNBIT_DIRECT) |
3320 | +IMPLEMENT_DIRECT(signbit, float4 , float, int4 , IMPLEMENT_SIGNBIT_DIRECT) |
3321 | +IMPLEMENT_DIRECT(signbit, float8 , float, int8 , IMPLEMENT_SIGNBIT_DIRECT) |
3322 | +IMPLEMENT_DIRECT(signbit, float16, float, int16, IMPLEMENT_SIGNBIT_DIRECT) |
3323 | + |
3324 | +#ifdef cl_khr_fp64 |
3325 | +IMPLEMENT_DIRECT(signbit, double , double, int , IMPLEMENT_SIGNBIT_BUILTIN_DOUBLE) |
3326 | +IMPLEMENT_DIRECT(signbit, double2 , double, long2 , IMPLEMENT_SIGNBIT_DIRECT) |
3327 | +IMPLEMENT_DIRECT(signbit, double3 , double, long3 , IMPLEMENT_SIGNBIT_DIRECT) |
3328 | +IMPLEMENT_DIRECT(signbit, double4 , double, long4 , IMPLEMENT_SIGNBIT_DIRECT) |
3329 | +IMPLEMENT_DIRECT(signbit, double8 , double, long8 , IMPLEMENT_SIGNBIT_DIRECT) |
3330 | +IMPLEMENT_DIRECT(signbit, double16, double, long16, IMPLEMENT_SIGNBIT_DIRECT) |
3331 | +#endif |
3332 | |
3333 | === modified file 'lib/kernel/sources.mk' |
3334 | --- lib/kernel/sources.mk 2011-11-14 19:14:01 +0000 |
3335 | +++ lib/kernel/sources.mk 2011-12-14 15:59:07 +0000 |
3336 | @@ -1,10 +1,13 @@ |
3337 | libkernel_a_SOURCES = templates.h \ |
3338 | barrier.c \ |
3339 | + get_work_dim.c \ |
3340 | get_global_size.c \ |
3341 | get_global_id.c \ |
3342 | + get_local_size.c \ |
3343 | get_local_id.c \ |
3344 | get_num_groups.c \ |
3345 | get_group_id.c \ |
3346 | + get_global_offset.c \ |
3347 | as_type.cl \ |
3348 | convert_type.cl \ |
3349 | acos.cl \ |
3350 | @@ -84,6 +87,7 @@ |
3351 | rotate.cl \ |
3352 | sub_sat.cl \ |
3353 | upsample.cl \ |
3354 | + popcount.cl \ |
3355 | mad24.cl \ |
3356 | mul24.cl \ |
3357 | degrees.cl \ |
3358 | |
3359 | === modified file 'lib/kernel/tce/Makefile.am' |
3360 | --- lib/kernel/tce/Makefile.am 2011-12-01 16:44:06 +0000 |
3361 | +++ lib/kernel/tce/Makefile.am 2011-12-14 15:59:07 +0000 |
3362 | @@ -53,7 +53,9 @@ |
3363 | endif |
3364 | |
3365 | .c.o: |
3366 | - $(CLANG) $(AM_CPPFLAGS) $(CLANGFLAGS) -c -emit-llvm -o $@ $< |
3367 | + $(CLANG) $(AM_CPPFLAGS) $(CLANGFLAGS) -c -emit-llvm -include $(top_srcdir)/include/tce/types.h -o $@ $< |
3368 | |
3369 | -$(libkernel_a_SOURCES:.c=.o): $(top_srcdir)/include/_kernel.h |
3370 | -$(libkernel_a_SOURCES:.cl=.o): $(top_srcdir)/include/_kernel.h ../templates.h |
3371 | +# TODO: This misses the dependency on the device-specific include files |
3372 | +# TODO: Dependencies should be generated automatically |
3373 | +$(patsubst %.c, %.o, $(filter %.c, $(libkernel_a_SOURCES))): $(top_srcdir)/include/tce/types.h |
3374 | +$(patsubst %.cl, %.o, $(filter %.cl, $(libkernel_a_SOURCES))): $(top_srcdir)/include/types.h $(top_srcdir)/include/tce/types.h $(top_srcdir)/include/_kernel.h ../templates.h |
3375 | |
3376 | === modified file 'lib/kernel/templates.h' |
3377 | --- lib/kernel/templates.h 2011-11-08 15:13:28 +0000 |
3378 | +++ lib/kernel/templates.h 2011-12-14 15:59:07 +0000 |
3379 | @@ -267,12 +267,11 @@ |
3380 | { \ |
3381 | return __builtin_##NAME(a); \ |
3382 | } \ |
3383 | - __IF_INT64( \ |
3384 | IMPLEMENT_BUILTIN_K_V(NAME, long2 , double2 , lo, hi) \ |
3385 | IMPLEMENT_BUILTIN_K_V(NAME, long3 , double3 , lo, s2) \ |
3386 | IMPLEMENT_BUILTIN_K_V(NAME, long4 , double4 , lo, hi) \ |
3387 | IMPLEMENT_BUILTIN_K_V(NAME, long8 , double8 , lo, hi) \ |
3388 | - IMPLEMENT_BUILTIN_K_V(NAME, long16, double16, lo, hi))) |
3389 | + IMPLEMENT_BUILTIN_K_V(NAME, long16, double16, lo, hi)) |
3390 | |
3391 | /******************************************************************************/ |
3392 | |
3393 | |
3394 | === modified file 'lib/kernel/x86_64/Makefile.am' |
3395 | --- lib/kernel/x86_64/Makefile.am 2011-12-01 16:44:06 +0000 |
3396 | +++ lib/kernel/x86_64/Makefile.am 2011-12-14 15:59:07 +0000 |
3397 | @@ -51,10 +51,12 @@ |
3398 | endif |
3399 | |
3400 | .c.o: |
3401 | - $(CLANG) $(AM_CPPFLAGS) $(CLANGFLAGS) -c -emit-llvm -o $@ $< |
3402 | + $(CLANG) $(AM_CPPFLAGS) $(CLANGFLAGS) -c -emit-llvm -include $(top_srcdir)/include/x86_64/types.h -o $@ $< |
3403 | |
3404 | .ll.o: |
3405 | $(LLVM_AS) -o $@ $< |
3406 | |
3407 | -$(libkernel_a_SOURCES:.c=.o): $(top_srcdir)/include/_kernel.h |
3408 | -$(libkernel_a_SOURCES:.cl=.o): $(top_srcdir)/include/_kernel.h ../templates.h |
3409 | +# TODO: This misses the dependency on the device-specific include files |
3410 | +# TODO: Dependencies should be generated automatically |
3411 | +$(patsubst %.c, %.o, $(filter %.c, $(libkernel_a_SOURCES))): $(top_srcdir)/include/x86_64/types.h |
3412 | +$(patsubst %.cl, %.o, $(filter %.cl, $(libkernel_a_SOURCES))): $(top_srcdir)/include/types.h $(top_srcdir)/include/x86_64/types.h $(top_srcdir)/include/_kernel.h ../templates.h |
3413 | |
3414 | === removed file 'lib/kernel/x86_64/copysign.cl' |
3415 | --- lib/kernel/x86_64/copysign.cl 2011-10-31 17:00:12 +0000 |
3416 | +++ lib/kernel/x86_64/copysign.cl 1970-01-01 00:00:00 +0000 |
3417 | @@ -1,169 +0,0 @@ |
3418 | -/* OpenCL built-in library: copysign() |
3419 | - |
3420 | - Copyright (c) 2011 Universidad Rey Juan Carlos |
3421 | - |
3422 | - Permission is hereby granted, free of charge, to any person obtaining a copy |
3423 | - of this software and associated documentation files (the "Software"), to deal |
3424 | - in the Software without restriction, including without limitation the rights |
3425 | - to use, copy, modify, merge, publish, distribute, sublicense, and/or sell |
3426 | - copies of the Software, and to permit persons to whom the Software is |
3427 | - furnished to do so, subject to the following conditions: |
3428 | - |
3429 | - The above copyright notice and this permission notice shall be included in |
3430 | - all copies or substantial portions of the Software. |
3431 | - |
3432 | - THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR |
3433 | - IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, |
3434 | - FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE |
3435 | - AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER |
3436 | - LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, |
3437 | - OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN |
3438 | - THE SOFTWARE. |
3439 | -*/ |
3440 | - |
3441 | -#if 0 |
3442 | - |
3443 | -#include "../templates.h" |
3444 | - |
3445 | -// LLVM generates non-optimal code for this implementation |
3446 | -DEFINE_EXPR_V_VV(copysign, |
3447 | - ({ |
3448 | - int bits = CHAR_BIT * sizeof(stype); |
3449 | - jtype sign_mask = (jtype)1 << (jtype)(bits - 1); |
3450 | - jtype result = ((~sign_mask & *(jtype*)&a) | |
3451 | - ( sign_mask & *(jtype*)&b)); |
3452 | - *(vtype*)&result; |
3453 | - })) |
3454 | - |
3455 | -#endif |
3456 | - |
3457 | - |
3458 | - |
3459 | -#define IMPLEMENT_DIRECT(NAME, TYPE, EXPR) \ |
3460 | - TYPE _cl_overloadable NAME(TYPE a, TYPE b) \ |
3461 | - { \ |
3462 | - return EXPR; \ |
3463 | - } |
3464 | - |
3465 | -#define IMPLEMENT_UPCAST(NAME, TYPE, UPTYPE, LO) \ |
3466 | - TYPE _cl_overloadable NAME(TYPE a, TYPE b) \ |
3467 | - { \ |
3468 | - return NAME(*(UPTYPE*)&a, *(UPTYPE*)&b).LO; \ |
3469 | - } |
3470 | - |
3471 | -#define IMPLEMENT_SPLIT(NAME, TYPE, LO, HI) \ |
3472 | - TYPE _cl_overloadable NAME(TYPE a, TYPE b) \ |
3473 | - { \ |
3474 | - return (TYPE)(NAME(a.LO, b.LO), NAME(a.HI, b.HI)); \ |
3475 | - } |
3476 | - |
3477 | - |
3478 | - |
3479 | -#define IMPLEMENT_COPYSIGN_DIRECT \ |
3480 | - ({ \ |
3481 | - int bits = CHAR_BIT * sizeof(stype); \ |
3482 | - jtype sign_mask = (jtype)1 << (jtype)(bits - 1); \ |
3483 | - jtype result = (~sign_mask & *(jtype*)&a) | (sign_mask & *(jtype*)&b); \ |
3484 | - *(vtype*)&result; \ |
3485 | - }) |
3486 | -#define IMPLEMENT_COPYSIGN_SSE_FLOAT4 \ |
3487 | - ({ \ |
3488 | - uint4 sign_mask = {0x80000000U, 0x80000000U, 0x80000000U, 0x80000000U}; \ |
3489 | - __asm__ ("andps %[src], %[dst]" : \ |
3490 | - [dst] "+x" (a) : \ |
3491 | - [src] "x" (~sign_mask)); \ |
3492 | - __asm__ ("andps %[src], %[dst]" : \ |
3493 | - [dst] "+x" (b) : \ |
3494 | - [src] "x" (sign_mask)); \ |
3495 | - __asm__ ("orps %[src], %[dst]" : \ |
3496 | - [dst] "+x" (a) : \ |
3497 | - [src] "x" (b)); \ |
3498 | - a; \ |
3499 | - }) |
3500 | -#define IMPLEMENT_COPYSIGN_AVX_FLOAT8 \ |
3501 | - ({ \ |
3502 | - uint8 sign_mask = {0x80000000U, 0x80000000U, 0x80000000U, 0x80000000U, \ |
3503 | - 0x80000000U, 0x80000000U, 0x80000000U, 0x80000000U}; \ |
3504 | - __asm__ ("andps256 %[src], %[dst]" : \ |
3505 | - [dst] "+x" (a) : \ |
3506 | - [src] "x" (~sign_mask)); \ |
3507 | - __asm__ ("andps256 %[src], %[dst]" : \ |
3508 | - [dst] "+x" (b) : \ |
3509 | - [src] "x" (sign_mask)); \ |
3510 | - __asm__ ("orps256 %[src], %[dst]" : \ |
3511 | - [dst] "+x" (a) : \ |
3512 | - [b] "x" (b)); \ |
3513 | - a; \ |
3514 | - }) |
3515 | -#define IMPLEMENT_COPYSIGN_SSE2_DOUBLE2 \ |
3516 | - ({ \ |
3517 | - ulong2 sign_mask = {0x8000000000000000UL, 0x8000000000000000UL}; \ |
3518 | - __asm__ ("andpd %[src], %[dst]" : \ |
3519 | - [dst] "+x" (a) : \ |
3520 | - [src] "x" (~sign_mask)); \ |
3521 | - __asm__ ("andpd %[src], %[dst]" : \ |
3522 | - [dst] "+x" (b) : \ |
3523 | - [src] "x" (sign_mask)); \ |
3524 | - __asm__ ("orpd %[src], %[dst]" : \ |
3525 | - [dst] "+x" (a) : \ |
3526 | - [src] "x" (b)); \ |
3527 | - a; \ |
3528 | - }) |
3529 | -#define IMPLEMENT_COPYSIGN_AVX_DOUBLE4 \ |
3530 | - ({ \ |
3531 | - ulong4 sign_mask = {0x8000000000000000UL, 0x8000000000000000UL, \ |
3532 | - 0x8000000000000000UL, 0x8000000000000000UL}; \ |
3533 | - __asm__ ("andpd256 %[src], %[dst]" : \ |
3534 | - [dst] "+x" (a) : \ |
3535 | - [src] "x" (~sign_mask)); \ |
3536 | - __asm__ ("andpd256 %[src], %[dst]" : \ |
3537 | - [dst] "+x" (b) : \ |
3538 | - [src] "x" (sign_mask)); \ |
3539 | - __asm__ ("orpd256 %[src], %[dst]" : \ |
3540 | - [dst] "+x" (a) : \ |
3541 | - [src] "x" (b)); \ |
3542 | - a; \ |
3543 | - }) |
3544 | - |
3545 | - |
3546 | - |
3547 | -#ifdef __SSE__ |
3548 | -IMPLEMENT_DIRECT(copysign, float , IMPLEMENT_COPYSIGN_SSE_FLOAT4) |
3549 | -IMPLEMENT_UPCAST(copysign, float2 , float4, lo) |
3550 | -IMPLEMENT_UPCAST(copysign, float3 , float4, s012) |
3551 | -IMPLEMENT_DIRECT(copysign, float4 , IMPLEMENT_COPYSIGN_SSE_FLOAT4) |
3552 | -# ifdef __AVX__ |
3553 | -IMPLEMENT_DIRECT(copysign, float8 , IMPLEMENT_COPYSIGN_AVX_FLOAT8) |
3554 | -# else |
3555 | -IMPLEMENT_SPLIT (copysign, float8 , lo, hi) |
3556 | -# endif |
3557 | -IMPLEMENT_SPLIT (copysign, float16, lo, hi) |
3558 | -#else |
3559 | -IMPLEMENT_DIRECT(copysign, float , IMPLEMENT_COPYSIGN_DIRECT) |
3560 | -IMPLEMENT_DIRECT(copysign, float2 , IMPLEMENT_COPYSIGN_DIRECT) |
3561 | -IMPLEMENT_DIRECT(copysign, float3 , IMPLEMENT_COPYSIGN_DIRECT) |
3562 | -IMPLEMENT_DIRECT(copysign, float4 , IMPLEMENT_COPYSIGN_DIRECT) |
3563 | -IMPLEMENT_DIRECT(copysign, float8 , IMPLEMENT_COPYSIGN_DIRECT) |
3564 | -IMPLEMENT_DIRECT(copysign, float16, IMPLEMENT_COPYSIGN_DIRECT) |
3565 | -#endif |
3566 | - |
3567 | -#ifdef __SSE2__ |
3568 | -IMPLEMENT_DIRECT(copysign, double , IMPLEMENT_COPYSIGN_SSE2_DOUBLE2) |
3569 | -IMPLEMENT_DIRECT(copysign, double2 , IMPLEMENT_COPYSIGN_SSE2_DOUBLE2) |
3570 | -# ifdef __AVX__ |
3571 | -IMPLEMENT_UPCAST(copysign, double3 , double4, s012) |
3572 | -IMPLEMENT_DIRECT(copysign, double4 , IMPLEMENT_COPYSIGN_AVX_DOUBLE4) |
3573 | -# else |
3574 | -IMPLEMENT_SPLIT (copysign, double3 , lo, s2) |
3575 | -IMPLEMENT_SPLIT (copysign, double4 , lo, hi) |
3576 | -# endif |
3577 | -IMPLEMENT_SPLIT (copysign, double8 , lo, hi) |
3578 | -IMPLEMENT_SPLIT (copysign, double16, lo, hi) |
3579 | -#else |
3580 | -IMPLEMENT_DIRECT(copysign, double , IMPLEMENT_COPYSIGN_DIRECT) |
3581 | -IMPLEMENT_DIRECT(copysign, double2 , IMPLEMENT_COPYSIGN_DIRECT) |
3582 | -IMPLEMENT_DIRECT(copysign, double3 , IMPLEMENT_COPYSIGN_DIRECT) |
3583 | -IMPLEMENT_DIRECT(copysign, double4 , IMPLEMENT_COPYSIGN_DIRECT) |
3584 | -IMPLEMENT_DIRECT(copysign, double8 , IMPLEMENT_COPYSIGN_DIRECT) |
3585 | -IMPLEMENT_DIRECT(copysign, double16, IMPLEMENT_COPYSIGN_DIRECT) |
3586 | -#endif |
3587 | |
3588 | === modified file 'lib/llvmopencl/Workgroup.cc' |
3589 | --- lib/llvmopencl/Workgroup.cc 2011-12-02 15:11:21 +0000 |
3590 | +++ lib/llvmopencl/Workgroup.cc 2011-12-14 15:59:07 +0000 |
3591 | @@ -184,6 +184,23 @@ |
3592 | |
3593 | IRBuilder<> builder(BasicBlock::Create(M.getContext(), "", L)); |
3594 | |
3595 | + // TODO: _num_groups_%c and friends should probably have type size_t |
3596 | + // instead of unsigned int, because this may avoid integer |
3597 | + // conversions when accessing these variables |
3598 | + |
3599 | + // TODO: _num_groups_%c and friends should probably be stored as |
3600 | + // arrays instead of as 3 independent variables, because this may |
3601 | + // lead to better code when the respective get_* functions are |
3602 | + // called in a loop (array access instead of switch statement) |
3603 | + |
3604 | + ptr = builder.CreateStructGEP(ai, |
3605 | + TypeBuilder<PoclContext, true>::WORK_DIM); |
3606 | + gv = M.getGlobalVariable("_work_dim"); |
3607 | + if (gv != NULL) { |
3608 | + v = builder.CreateLoad(builder.CreateConstGEP1_32(ptr, 0)); |
3609 | + builder.CreateStore(v, gv); |
3610 | + } |
3611 | + |
3612 | ptr = builder.CreateStructGEP(ai, |
3613 | TypeBuilder<PoclContext, true>::GROUP_ID); |
3614 | for (int i = 0; i < 3; ++i) { |
3615 | @@ -195,6 +212,8 @@ |
3616 | } |
3617 | } |
3618 | |
3619 | + ptr = builder.CreateStructGEP(ai, |
3620 | + TypeBuilder<PoclContext, true>::NUM_GROUPS); |
3621 | for (int i = 0; i < 3; ++i) { |
3622 | snprintf(s, STRING_LENGTH, "_num_groups_%c", 'x' + i); |
3623 | gv = M.getGlobalVariable(s); |
3624 | @@ -204,6 +223,17 @@ |
3625 | } |
3626 | } |
3627 | |
3628 | + ptr = builder.CreateStructGEP(ai, |
3629 | + TypeBuilder<PoclContext, true>::GLOBAL_OFFSET); |
3630 | + for (int i = 0; i < 3; ++i) { |
3631 | + snprintf(s, STRING_LENGTH, "_global_offset_%c", 'x' + i); |
3632 | + gv = M.getGlobalVariable(s); |
3633 | + if (gv != NULL) { |
3634 | + v = builder.CreateLoad(builder.CreateConstGEP2_32(ptr, 0, i)); |
3635 | + builder.CreateStore(v, gv); |
3636 | + } |
3637 | + } |
3638 | + |
3639 | CallInst *c = builder.CreateCall(F, ArrayRef<Value*>(arguments)); |
3640 | builder.CreateRetVoid(); |
3641 | |
3642 | @@ -264,6 +294,44 @@ |
3643 | } |
3644 | } |
3645 | |
3646 | + // Privatize _work_dim |
3647 | + gv[0] = M.getGlobalVariable("_work_dim"); |
3648 | + if (gv[0] != NULL) { |
3649 | + ai[0] = builder.CreateAlloca(gv[0]->getType()->getElementType(), |
3650 | + 0, "_work_dim"); |
3651 | + if(gv[0]->hasInitializer()) { |
3652 | + Constant *c = gv[0]->getInitializer(); |
3653 | + builder.CreateStore(c, ai[0]); |
3654 | + } |
3655 | + } |
3656 | + for (Function::iterator i = F->begin(), e = F->end(); i != e; ++i) { |
3657 | + for (BasicBlock::iterator ii = i->begin(), ee = i->end(); |
3658 | + ii != ee; ++ii) { |
3659 | + ii->replaceUsesOfWith(gv[0], ai[0]); |
3660 | + } |
3661 | + } |
3662 | + |
3663 | + // Privatize _num_groups |
3664 | + for (int i = 0; i < 3; ++i) { |
3665 | + snprintf(s, STRING_LENGTH, "_num_groups_%c", 'x' + i); |
3666 | + gv[i] = M.getGlobalVariable(s); |
3667 | + if (gv[i] != NULL) { |
3668 | + ai[i] = builder.CreateAlloca(gv[i]->getType()->getElementType(), |
3669 | + 0, s); |
3670 | + if(gv[i]->hasInitializer()) { |
3671 | + Constant *c = gv[i]->getInitializer(); |
3672 | + builder.CreateStore(c, ai[i]); |
3673 | + } |
3674 | + } |
3675 | + } |
3676 | + for (Function::iterator i = F->begin(), e = F->end(); i != e; ++i) { |
3677 | + for (BasicBlock::iterator ii = i->begin(), ee = i->end(); |
3678 | + ii != ee; ++ii) { |
3679 | + for (int j = 0; j < 3; ++j) |
3680 | + ii->replaceUsesOfWith(gv[j], ai[j]); |
3681 | + } |
3682 | + } |
3683 | + |
3684 | // Privatize _group_id |
3685 | for (int i = 0; i < 3; ++i) { |
3686 | snprintf(s, STRING_LENGTH, "_group_id_%c", 'x' + i); |
3687 | @@ -285,9 +353,9 @@ |
3688 | } |
3689 | } |
3690 | |
3691 | - // Privatize _num_groups |
3692 | + // Privatize _global_offset |
3693 | for (int i = 0; i < 3; ++i) { |
3694 | - snprintf(s, STRING_LENGTH, "_num_groups_%c", 'x' + i); |
3695 | + snprintf(s, STRING_LENGTH, "_global_offset_%c", 'x' + i); |
3696 | gv[i] = M.getGlobalVariable(s); |
3697 | if (gv[i] != NULL) { |
3698 | ai[i] = builder.CreateAlloca(gv[i]->getType()->getElementType(), |
3699 | |
3700 | === modified file 'tests/testsuite.at' |
3701 | --- tests/testsuite.at 2011-12-09 13:35:18 +0000 |
3702 | +++ tests/testsuite.at 2011-12-14 15:59:07 +0000 |
3703 | @@ -172,3 +172,34 @@ |
3704 | ]) |
3705 | AT_CHECK([$abs_top_builddir/examples/trig/trig], 0, expout) |
3706 | AT_CLEANUP |
3707 | + |
3708 | +AT_BANNER([Full applications]) |
3709 | + |
3710 | +AT_SETUP([Scalar wave equation]) |
3711 | +AT_DATA([expout], |
3712 | +[Initial condition: t=0 |
3713 | +Time step 1: t=0.03125 |
3714 | +Time step 2: t=0.0625 |
3715 | +Time step 3: t=0.09375 |
3716 | +Time step 4: t=0.125 |
3717 | +phi(0 ,0 ,0 ) = 0 |
3718 | +phi(0.0625 ,0.0625 ,0.0625 ) = 0.0057744 |
3719 | +phi(0.125 ,0.125 ,0.125 ) = 0.0435832 |
3720 | +phi(0.1875 ,0.1875 ,0.1875 ) = 0.133357 |
3721 | +phi(0.25 ,0.25 ,0.25 ) = 0.274951 |
3722 | +phi(0.3125 ,0.3125 ,0.3125 ) = 0.447032 |
3723 | +phi(0.375 ,0.375 ,0.375 ) = 0.613262 |
3724 | +phi(0.4375 ,0.4375 ,0.4375 ) = 0.733705 |
3725 | +phi(0.5 ,0.5 ,0.5 ) = 0.777678 |
3726 | +phi(0.5625 ,0.5625 ,0.5625 ) = 0.733705 |
3727 | +phi(0.625 ,0.625 ,0.625 ) = 0.613262 |
3728 | +phi(0.6875 ,0.6875 ,0.6875 ) = 0.447032 |
3729 | +phi(0.75 ,0.75 ,0.75 ) = 0.274951 |
3730 | +phi(0.8125 ,0.8125 ,0.8125 ) = 0.133357 |
3731 | +phi(0.875 ,0.875 ,0.875 ) = 0.0435832 |
3732 | +phi(0.9375 ,0.9375 ,0.9375 ) = 0.0057744 |
3733 | +phi(1 ,1 ,1 ) = 0 |
3734 | +Done. |
3735 | +]) |
3736 | +AT_CHECK([$abs_top_builddir/examples/scalarwave/scalarwave], 0, expout) |
3737 | +AT_CLEANUP |
get_global_offset() implementation seems wrong. I think it needs the separate values from clEnqueueNDRang eKernel( ) which are not stored yet in the context struct, IIRC.
In my opinion having failing tests in the test suite that reproduce (reported) bugs is fine but they should be marked with XFAIL (with a bug id maybe?) or similar until they are fixed so we won't get confused what is a regression and what a known bug reproduction (or if you are into TDD, just unimplemented features).
Other than these, seems good to merge for me.