Merge lp:~schnetter/pocl/pocl into lp:~pocl/pocl/trunk
- pocl
- Merge into trunk
Status: | Merged |
---|---|
Merged at revision: | 335 |
Proposed branch: | lp:~schnetter/pocl/pocl |
Merge into: | lp:~pocl/pocl/trunk |
Diff against target: |
1371 lines (+573/-231) 27 files modified
INSTALL (+5/-1) include/_kernel.h (+35/-0) lib/CL/clEnqueueCopyBuffer.c (+18/-2) lib/CL/clEnqueueCopyBufferRect.c (+44/-1) lib/CL/clEnqueueNDRangeKernel.c (+2/-0) lib/CL/clEnqueueReadBuffer.c (+15/-5) lib/CL/clEnqueueReadBufferRect.c (+52/-12) lib/CL/clEnqueueUnmapMemObject.c (+1/-1) lib/CL/clEnqueueWriteBuffer.c (+12/-17) lib/CL/clEnqueueWriteBufferRect.c (+32/-20) lib/CL/clFinish.c (+6/-1) lib/CL/clGetDeviceInfo.c (+21/-21) lib/CL/clGetKernelWorkGroupInfo.c (+1/-1) lib/CL/clReleaseProgram.c (+1/-1) lib/CL/devices/basic/basic.h (+10/-0) lib/CL/devices/common.h (+1/-0) lib/CL/devices/devices.c (+2/-2) lib/CL/devices/pthread/pocl-pthread.h (+10/-0) lib/CL/pocl_cl.h (+13/-1) lib/kernel/atomics.cl (+149/-0) lib/kernel/hadd.cl (+0/-2) lib/kernel/rhadd.cl (+0/-2) lib/kernel/sources.mk (+140/-138) lib/kernel/sub_sat.cl (+0/-2) lib/llvmopencl/Makefile.am (+1/-1) tests/regression/test_constant_array.cpp (+1/-0) tests/regression/test_infinite_loop.cpp (+1/-0) |
To merge this branch: | bzr merge lp:~schnetter/pocl/pocl |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
pocl maintaners | Pending | ||
Review via email:
|
Commit message
Description of the change
I corrected several build errors (and some cosmetic issues) for Mac OS X.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Pekka Jääskeläinen (pekka-jaaskelainen) wrote : | # |
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Erik Schnetter (schnetter) wrote : | # |
I'm still relatively unfamiliar with bzr. How exactly would I "commit directly to 0.6"? I assume I check out the 0.6 branch first, and then what? Would "bzr merge -i" do the trick that allows me to select the fixes?
Should I commit my cosmetic changes as well, or should I commit those to the trunk only?
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Pekka Jääskeläinen (pekka-jaaskelainen) wrote : | # |
Yes, check out 0.6 (e.g. 'bzr co lp:pocl/0.6 pocl-0.6') then do a cherry pick merge from your branch (bzr merge -c revno ../yourbranchlo
IMO code base cleanups (cosmetic) are OK to commit to 0.6 as long as we do not introduce instability/
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Erik Schnetter (schnetter) wrote : | # |
I cherry-picked the changes into a local checkout of the 0.6 branch.
However, it appears as if I only received the changes, not the
commits? I now have a large number of "locally modified" files... Is
this to be expected? How should I handle this?
-erik
On Mon, Aug 13, 2012 at 11:33 PM, Pekka Jääskeläinen
<email address hidden> wrote:
> Yes, check out 0.6 (e.g. 'bzr co lp:pocl/0.6 pocl-0.6') then do a cherry pick merge from your branch (bzr merge -c revno ../yourbranchlo
>
> IMO code base cleanups (cosmetic) are OK to commit to 0.6 as long as we do not introduce instability/
> --
> https:/
> You are the owner of lp:~schnetter/pocl/pocl.
--
Erik Schnetter <email address hidden>
http://
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Pekka Jääskeläinen (pekka-jaaskelainen) wrote : | # |
That's normal. The "cherry picking" of Bazaar is not perfect in this regard. http://
- 334. By Erik Schnetter
-
Correct several build errors for Mac OS X
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Erik Schnetter (schnetter) wrote : | # |
I committed my changes (except adding the atomics) to the 0.6 branch, then merged the branch into the trunk.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Pekka Jääskeläinen (pekka-jaaskelainen) wrote : | # |
I can't see them in 0.6. Yep, better put the atomics to trunk, let's try to keep 0.6 in feature freeze.
- 335. By Erik Schnetter
-
Implement atomics
Preview Diff
1 | === modified file 'INSTALL' |
2 | --- INSTALL 2012-08-08 12:45:17 +0000 |
3 | +++ INSTALL 2012-08-15 20:36:22 +0000 |
4 | @@ -36,10 +36,14 @@ |
5 | ./configure --enable-shared --prefix=YOUR_INSTALLATION_PREFIX_HERE |
6 | make REQUIRES_RTTI=1 && make install |
7 | |
8 | - |
9 | !!!NOTE: A 64-bit host is at the moment required. See |
10 | https://bugs.launchpad.net/pocl/+bug/911911 |
11 | |
12 | +!!!NOTE: On Mac OS X, you cannot build LLVM with LLVM; you need to |
13 | +build using gcc. Use e.g. this command to configure instead: |
14 | + |
15 | + ./configure --enable-shared CC=gcc CXX=g++ --prefix=YOUR_INSTALLATION_PREFIX_HERE |
16 | + |
17 | After all the requirements are installed. The installation procedure |
18 | follows the usual autotools build+install. If you are using a development |
19 | source tree, you need to generate the autotool build files with |
20 | |
21 | === modified file 'include/_kernel.h' |
22 | --- include/_kernel.h 2012-05-25 16:05:51 +0000 |
23 | +++ include/_kernel.h 2012-08-15 20:36:22 +0000 |
24 | @@ -1608,6 +1608,41 @@ |
25 | #endif |
26 | |
27 | |
28 | |
29 | +/* Atomic operations */ |
30 | + |
31 | +#define _CL_DECLARE_ATOMICS(MOD, TYPE) \ |
32 | + _cl_overloadable TYPE atomic_add (volatile MOD TYPE *p, TYPE val); \ |
33 | + _cl_overloadable TYPE atomic_sub (volatile MOD TYPE *p, TYPE val); \ |
34 | + _cl_overloadable TYPE atomic_xchg (volatile MOD TYPE *p, TYPE val); \ |
35 | + _cl_overloadable TYPE atomic_inc (volatile MOD TYPE *p); \ |
36 | + _cl_overloadable TYPE atomic_dec (volatile MOD TYPE *p); \ |
37 | + _cl_overloadable TYPE atomic_cmpxchg(volatile MOD TYPE *p, TYPE cmp, TYPE val); \ |
38 | + _cl_overloadable TYPE atomic_min (volatile MOD TYPE *p, TYPE val); \ |
39 | + _cl_overloadable TYPE atomic_max (volatile MOD TYPE *p, TYPE val); \ |
40 | + _cl_overloadable TYPE atomic_and (volatile MOD TYPE *p, TYPE val); \ |
41 | + _cl_overloadable TYPE atomic_or (volatile MOD TYPE *p, TYPE val); \ |
42 | + _cl_overloadable TYPE atomic_xor (volatile MOD TYPE *p, TYPE val); |
43 | +_CL_DECLARE_ATOMICS(__global, int ) |
44 | +_CL_DECLARE_ATOMICS(__global, uint) |
45 | +_CL_DECLARE_ATOMICS(__local , int ) |
46 | +_CL_DECLARE_ATOMICS(__local , uint) |
47 | + |
48 | +_cl_overloadable float atomic_xchg(volatile __global float *p, float val); |
49 | +_cl_overloadable float atomic_xchg(volatile __local float *p, float val); |
50 | + |
51 | +#define atom_add atomic_add |
52 | +#define atom_sub atomic_sub |
53 | +#define atom_xchg atomic_xchg |
54 | +#define atom_inc atomic_inc |
55 | +#define atom_dec atomic_dec |
56 | +#define atom_cmpxchg atomic_cmpxchg |
57 | +#define atom_min atomic_min |
58 | +#define atom_max atomic_max |
59 | +#define atom_and atomic_and |
60 | +#define atom_or atomic_or |
61 | +#define atom_xor atomic_xor |
62 | + |
63 | + |
64 | |
65 | /* Miscellaneous Vector Functions */ |
66 | |
67 | // This code leads to an ICE in Clang |
68 | |
69 | === modified file 'lib/CL/clEnqueueCopyBuffer.c' |
70 | --- lib/CL/clEnqueueCopyBuffer.c 2012-05-14 11:45:48 +0000 |
71 | +++ lib/CL/clEnqueueCopyBuffer.c 2012-08-15 20:36:22 +0000 |
72 | @@ -23,6 +23,7 @@ |
73 | */ |
74 | |
75 | #include "pocl_cl.h" |
76 | +#include "pocl_icd.h" |
77 | #include "utlist.h" |
78 | #include <assert.h> |
79 | |
80 | @@ -55,14 +56,28 @@ |
81 | return CL_INVALID_VALUE; |
82 | |
83 | device_id = command_queue->device; |
84 | + |
85 | for (i = 0; i < command_queue->context->num_devices; ++i) |
86 | { |
87 | if (command_queue->context->devices[i] == device_id) |
88 | break; |
89 | } |
90 | - |
91 | assert(i < command_queue->context->num_devices); |
92 | |
93 | + if (event != NULL) |
94 | + { |
95 | + *event = (cl_event)malloc(sizeof(struct _cl_event)); |
96 | + if (*event == NULL) |
97 | + return CL_OUT_OF_HOST_MEMORY; |
98 | + POCL_INIT_OBJECT(*event); |
99 | + (*event)->queue = command_queue; |
100 | + POCL_INIT_ICD_OBJECT(*event); |
101 | + clRetainCommandQueue (command_queue); |
102 | + |
103 | + POCL_PROFILE_QUEUED; |
104 | + } |
105 | + |
106 | + |
107 | _cl_command_node * cmd = malloc(sizeof(_cl_command_node)); |
108 | if (cmd == NULL) |
109 | return CL_OUT_OF_HOST_MEMORY; |
110 | @@ -81,8 +96,9 @@ |
111 | cmd->command.copy.dst_ptr = dst_buffer->device_ptrs[device_id->dev_id] + dst_offset; |
112 | cmd->command.copy.cb = cb; |
113 | cmd->next = NULL; |
114 | + cmd->event = event ? *event : NULL; |
115 | |
116 | - LL_APPEND(command_queue->root, cmd ); |
117 | + LL_APPEND(command_queue->root, cmd); |
118 | |
119 | return CL_SUCCESS; |
120 | } |
121 | |
122 | === modified file 'lib/CL/clEnqueueCopyBufferRect.c' |
123 | --- lib/CL/clEnqueueCopyBufferRect.c 2012-05-14 11:45:48 +0000 |
124 | +++ lib/CL/clEnqueueCopyBufferRect.c 2012-08-15 20:36:22 +0000 |
125 | @@ -22,6 +22,7 @@ |
126 | */ |
127 | |
128 | #include "pocl_cl.h" |
129 | +#include "pocl_icd.h" |
130 | #include <assert.h> |
131 | |
132 | CL_API_ENTRY cl_int CL_API_CALL |
133 | @@ -69,14 +70,51 @@ |
134 | return CL_INVALID_VALUE; |
135 | |
136 | device_id = command_queue->device; |
137 | + |
138 | for (i = 0; i < command_queue->context->num_devices; ++i) |
139 | { |
140 | if (command_queue->context->devices[i] == device_id) |
141 | break; |
142 | } |
143 | - |
144 | assert(i < command_queue->context->num_devices); |
145 | |
146 | + if (event != NULL) |
147 | + { |
148 | + *event = (cl_event)malloc(sizeof(struct _cl_event)); |
149 | + if (*event == NULL) |
150 | + return CL_OUT_OF_HOST_MEMORY; |
151 | + POCL_INIT_OBJECT(*event); |
152 | + (*event)->queue = command_queue; |
153 | + POCL_INIT_ICD_OBJECT(*event); |
154 | + |
155 | + clRetainCommandQueue (command_queue); |
156 | + |
157 | + POCL_PROFILE_QUEUED; |
158 | + } |
159 | + |
160 | + |
161 | + /* execute directly */ |
162 | + /* TODO: enqueue the read_rect if this is a non-blocking read (see |
163 | + clEnqueueReadBuffer) */ |
164 | + if (command_queue->properties & CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE) |
165 | + { |
166 | + /* wait for the event in event_wait_list to finish */ |
167 | + POCL_ABORT_UNIMPLEMENTED(); |
168 | + } |
169 | + else |
170 | + { |
171 | + /* in-order queue - all previously enqueued commands must |
172 | + * finish before this read */ |
173 | + // ensure our buffer is not freed yet |
174 | + clRetainMemObject (src_buffer); |
175 | + clRetainMemObject (dst_buffer); |
176 | + clFinish(command_queue); |
177 | + } |
178 | + POCL_PROFILE_SUBMITTED; |
179 | + POCL_PROFILE_RUNNING; |
180 | + |
181 | + /* TODO: offset computation doesn't work in case the ptr is not |
182 | + a direct pointer */ |
183 | device_id->copy_rect(device_id->data, |
184 | src_buffer->device_ptrs[device_id->dev_id], |
185 | dst_buffer->device_ptrs[device_id->dev_id], |
186 | @@ -84,5 +122,10 @@ |
187 | src_row_pitch, src_slice_pitch, |
188 | dst_row_pitch, dst_slice_pitch); |
189 | |
190 | + POCL_PROFILE_COMPLETE; |
191 | + |
192 | + clReleaseMemObject (src_buffer); |
193 | + clReleaseMemObject (dst_buffer); |
194 | + |
195 | return CL_SUCCESS; |
196 | } |
197 | |
198 | === modified file 'lib/CL/clEnqueueNDRangeKernel.c' |
199 | --- lib/CL/clEnqueueNDRangeKernel.c 2012-05-30 14:10:44 +0000 |
200 | +++ lib/CL/clEnqueueNDRangeKernel.c 2012-08-15 20:36:22 +0000 |
201 | @@ -314,6 +314,8 @@ |
202 | } |
203 | } |
204 | |
205 | + command_node->event = event ? *event : NULL; |
206 | + |
207 | LL_APPEND(command_queue->root, command_node); |
208 | |
209 | return CL_SUCCESS; |
210 | |
211 | === modified file 'lib/CL/clEnqueueReadBuffer.c' |
212 | --- lib/CL/clEnqueueReadBuffer.c 2012-05-29 12:16:56 +0000 |
213 | +++ lib/CL/clEnqueueReadBuffer.c 2012-08-15 20:36:22 +0000 |
214 | @@ -38,6 +38,7 @@ |
215 | cl_event *event) CL_API_SUFFIX__VERSION_1_0 |
216 | { |
217 | cl_device_id device; |
218 | + unsigned i; |
219 | |
220 | if (command_queue == NULL) |
221 | return CL_INVALID_COMMAND_QUEUE; |
222 | @@ -54,6 +55,13 @@ |
223 | |
224 | device = command_queue->device; |
225 | |
226 | + for (i = 0; i < command_queue->context->num_devices; ++i) |
227 | + { |
228 | + if (command_queue->context->devices[i] == device) |
229 | + break; |
230 | + } |
231 | + assert(i < command_queue->context->num_devices); |
232 | + |
233 | if (event != NULL) |
234 | { |
235 | *event = (cl_event)malloc(sizeof(struct _cl_event)); |
236 | @@ -62,14 +70,15 @@ |
237 | POCL_INIT_OBJECT(*event); |
238 | (*event)->queue = command_queue; |
239 | POCL_INIT_ICD_OBJECT(*event); |
240 | - |
241 | clRetainCommandQueue (command_queue); |
242 | |
243 | - POCL_PROFILE_QUEUED; |
244 | + POCL_PROFILE_QUEUED; |
245 | } |
246 | |
247 | |
248 | /* enqueue the read, or execute directly */ |
249 | + /* TODO: why do we implement both? direct execution seems |
250 | + unnecessary. */ |
251 | if (blocking_read) |
252 | { |
253 | if (command_queue->properties & CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE) |
254 | @@ -101,16 +110,17 @@ |
255 | _cl_command_node * cmd = malloc(sizeof(_cl_command_node)); |
256 | if (cmd == NULL) |
257 | return CL_OUT_OF_HOST_MEMORY; |
258 | - |
259 | + |
260 | cmd->type = CL_COMMAND_TYPE_READ; |
261 | cmd->command.read.data = device->data; |
262 | cmd->command.read.host_ptr = ptr; |
263 | cmd->command.read.device_ptr = buffer->device_ptrs[device->dev_id]+offset; |
264 | cmd->command.read.cb = cb; |
265 | + cmd->command.read.buffer = buffer; |
266 | cmd->next = NULL; |
267 | - cmd->command.read.buffer = buffer; |
268 | + cmd->event = event ? *event : NULL; |
269 | clRetainMemObject (buffer); |
270 | - LL_APPEND(command_queue->root, cmd ); |
271 | + LL_APPEND(command_queue->root, cmd); |
272 | } |
273 | |
274 | return CL_SUCCESS; |
275 | |
276 | === modified file 'lib/CL/clEnqueueReadBufferRect.c' |
277 | --- lib/CL/clEnqueueReadBufferRect.c 2012-05-25 16:05:51 +0000 |
278 | +++ lib/CL/clEnqueueReadBufferRect.c 2012-08-15 20:36:22 +0000 |
279 | @@ -22,6 +22,7 @@ |
280 | */ |
281 | |
282 | #include "pocl_cl.h" |
283 | +#include "pocl_icd.h" |
284 | #include <assert.h> |
285 | #include <stdio.h> |
286 | |
287 | @@ -41,7 +42,7 @@ |
288 | const cl_event *event_wait_list, |
289 | cl_event *event) CL_API_SUFFIX__VERSION_1_1 |
290 | { |
291 | - cl_device_id device_id; |
292 | + cl_device_id device; |
293 | unsigned i; |
294 | |
295 | if (command_queue == NULL) |
296 | @@ -58,28 +59,67 @@ |
297 | (host_origin == NULL) || |
298 | (region == NULL)) |
299 | return CL_INVALID_VALUE; |
300 | - |
301 | + |
302 | if ((region[0]*region[1]*region[2] > 0) && |
303 | (buffer_origin[0] + region[0]-1 + |
304 | buffer_row_pitch * (buffer_origin[1] + region[1]-1) + |
305 | buffer_slice_pitch * (buffer_origin[2] + region[2]-1) >= buffer->size)) |
306 | return CL_INVALID_VALUE; |
307 | |
308 | - device_id = command_queue->device; |
309 | + device = command_queue->device; |
310 | + |
311 | for (i = 0; i < command_queue->context->num_devices; ++i) |
312 | { |
313 | - if (command_queue->context->devices[i] == device_id) |
314 | - break; |
315 | + if (command_queue->context->devices[i] == device) |
316 | + break; |
317 | } |
318 | - |
319 | assert(i < command_queue->context->num_devices); |
320 | |
321 | - device_id->read_rect(device_id->data, ptr, |
322 | - buffer->device_ptrs[device_id->dev_id], |
323 | - buffer_origin, host_origin, region, |
324 | - buffer_row_pitch, buffer_slice_pitch, |
325 | - host_row_pitch, host_slice_pitch); |
326 | - |
327 | + if (event != NULL) |
328 | + { |
329 | + *event = (cl_event)malloc(sizeof(struct _cl_event)); |
330 | + if (*event == NULL) |
331 | + return CL_OUT_OF_HOST_MEMORY; |
332 | + POCL_INIT_OBJECT(*event); |
333 | + (*event)->queue = command_queue; |
334 | + POCL_INIT_ICD_OBJECT(*event); |
335 | + |
336 | + clRetainCommandQueue (command_queue); |
337 | + |
338 | + POCL_PROFILE_QUEUED; |
339 | + } |
340 | + |
341 | + |
342 | + /* execute directly */ |
343 | + /* TODO: enqueue the read_rect if this is a non-blocking read (see |
344 | + clEnqueueReadBuffer) */ |
345 | + if (command_queue->properties & CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE) |
346 | + { |
347 | + /* wait for the event in event_wait_list to finish */ |
348 | + POCL_ABORT_UNIMPLEMENTED(); |
349 | + } |
350 | + else |
351 | + { |
352 | + /* in-order queue - all previously enqueued commands must |
353 | + * finish before this read */ |
354 | + // ensure our buffer is not freed yet |
355 | + clRetainMemObject (buffer); |
356 | + clFinish(command_queue); |
357 | + } |
358 | + POCL_PROFILE_SUBMITTED; |
359 | + POCL_PROFILE_RUNNING; |
360 | + |
361 | + /* TODO: offset computation doesn't work in case the ptr is not |
362 | + a direct pointer */ |
363 | + device->read_rect(device->data, ptr, |
364 | + buffer->device_ptrs[device->dev_id], |
365 | + buffer_origin, host_origin, region, |
366 | + buffer_row_pitch, buffer_slice_pitch, |
367 | + host_row_pitch, host_slice_pitch); |
368 | + |
369 | + POCL_PROFILE_COMPLETE; |
370 | + |
371 | + clReleaseMemObject (buffer); |
372 | |
373 | return CL_SUCCESS; |
374 | } |
375 | |
376 | === modified file 'lib/CL/clEnqueueUnmapMemObject.c' |
377 | --- lib/CL/clEnqueueUnmapMemObject.c 2012-05-29 12:16:56 +0000 |
378 | +++ lib/CL/clEnqueueUnmapMemObject.c 2012-08-15 20:36:22 +0000 |
379 | @@ -104,6 +104,6 @@ |
380 | DL_DELETE(memobj->mappings, mapping); |
381 | memobj->map_count--; |
382 | clReleaseMemObject (memobj); |
383 | - event = NULL; |
384 | + |
385 | return CL_SUCCESS; |
386 | } |
387 | |
388 | === modified file 'lib/CL/clEnqueueWriteBuffer.c' |
389 | --- lib/CL/clEnqueueWriteBuffer.c 2012-05-29 12:16:56 +0000 |
390 | +++ lib/CL/clEnqueueWriteBuffer.c 2012-08-15 20:36:22 +0000 |
391 | @@ -37,7 +37,7 @@ |
392 | const cl_event *event_wait_list, |
393 | cl_event *event) CL_API_SUFFIX__VERSION_1_0 |
394 | { |
395 | - cl_device_id device_id; |
396 | + cl_device_id device; |
397 | unsigned i; |
398 | |
399 | if (command_queue == NULL) |
400 | @@ -53,13 +53,13 @@ |
401 | (offset + cb > buffer->size)) |
402 | return CL_INVALID_VALUE; |
403 | |
404 | - device_id = command_queue->device; |
405 | + device = command_queue->device; |
406 | + |
407 | for (i = 0; i < command_queue->context->num_devices; ++i) |
408 | { |
409 | - if (command_queue->context->devices[i] == device_id) |
410 | - break; |
411 | + if (command_queue->context->devices[i] == device) |
412 | + break; |
413 | } |
414 | - |
415 | assert(i < command_queue->context->num_devices); |
416 | |
417 | if (event != NULL) |
418 | @@ -76,6 +76,8 @@ |
419 | } |
420 | |
421 | /* enqueue the write, or execute directly */ |
422 | + /* TODO: why do we implement both? direct execution seems |
423 | + unnecessary. */ |
424 | if (blocking_write) |
425 | { |
426 | if (command_queue->properties & CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE) |
427 | @@ -94,17 +96,9 @@ |
428 | POCL_PROFILE_SUBMITTED; |
429 | POCL_PROFILE_RUNNING; |
430 | /* TODO: fixme. The offset computation must be done at the device driver. */ |
431 | - device_id->write(device_id->data, ptr, buffer->device_ptrs[device_id->dev_id]+offset, cb); |
432 | + device->write(device->data, ptr, buffer->device_ptrs[device->dev_id]+offset, cb); |
433 | POCL_PROFILE_COMPLETE; |
434 | |
435 | - if (command_queue->properties & CL_QUEUE_PROFILING_ENABLE && |
436 | - event != NULL) |
437 | - { |
438 | - (*event)->status = CL_COMPLETE; |
439 | - (*event)->time_end = |
440 | - command_queue->device->get_timer_value(command_queue->device->data); |
441 | - } |
442 | - |
443 | clReleaseMemObject (buffer); |
444 | } |
445 | else |
446 | @@ -112,14 +106,15 @@ |
447 | _cl_command_node * cmd = malloc(sizeof(_cl_command_node)); |
448 | if (cmd == NULL) |
449 | return CL_OUT_OF_HOST_MEMORY; |
450 | - |
451 | + |
452 | cmd->type = CL_COMMAND_TYPE_WRITE; |
453 | - cmd->command.write.data = device_id->data; |
454 | + cmd->command.write.data = device->data; |
455 | cmd->command.write.host_ptr = ptr; |
456 | cmd->command.write.device_ptr = buffer->device_ptrs[i]+offset; |
457 | cmd->command.write.cb = cb; |
458 | + cmd->command.write.buffer = buffer; |
459 | cmd->next = NULL; |
460 | - cmd->command.write.buffer = buffer; |
461 | + cmd->event = event ? *event : NULL; |
462 | clRetainMemObject (buffer); |
463 | |
464 | LL_APPEND(command_queue->root, cmd); |
465 | |
466 | === modified file 'lib/CL/clEnqueueWriteBufferRect.c' |
467 | --- lib/CL/clEnqueueWriteBufferRect.c 2012-05-14 11:45:48 +0000 |
468 | +++ lib/CL/clEnqueueWriteBufferRect.c 2012-08-15 20:36:22 +0000 |
469 | @@ -40,7 +40,7 @@ |
470 | const cl_event *event_wait_list, |
471 | cl_event *event) CL_API_SUFFIX__VERSION_1_1 |
472 | { |
473 | - cl_device_id device_id; |
474 | + cl_device_id device; |
475 | unsigned i; |
476 | |
477 | if (command_queue == NULL) |
478 | @@ -57,43 +57,55 @@ |
479 | (host_origin == NULL) || |
480 | (region == NULL)) |
481 | return CL_INVALID_VALUE; |
482 | - |
483 | + |
484 | if ((region[0]*region[1]*region[2] > 0) && |
485 | (buffer_origin[0] + region[0]-1 + |
486 | buffer_row_pitch * (buffer_origin[1] + region[1]-1) + |
487 | buffer_slice_pitch * (buffer_origin[2] + region[2]-1) >= buffer->size)) |
488 | { |
489 | POCL_ABORT_UNIMPLEMENTED(); |
490 | -#if 0 |
491 | - printf("bo=[%d,%d,%d]\n" |
492 | - "ho=[%d,%d,%d]\n" |
493 | - "re=[%d,%d,%d]\n" |
494 | - "bp=[,%d,%d]\n" |
495 | - "hp=[,%d,%d]\n" |
496 | - "bs=[%d]\n", |
497 | - (int)buffer_origin[0], (int)buffer_origin[1], (int)buffer_origin[2], |
498 | - (int)host_origin[0], (int)host_origin[1], (int)host_origin[2], |
499 | - (int)region[0], (int)region[1], (int)region[2], |
500 | - (int)buffer_row_pitch, (int)buffer_slice_pitch, |
501 | - (int)host_row_pitch, (int)host_slice_pitch, |
502 | - (int)buffer->size); |
503 | -#endif |
504 | return CL_INVALID_VALUE; |
505 | } |
506 | |
507 | - device_id = command_queue->device; |
508 | + device = command_queue->device; |
509 | + |
510 | for (i = 0; i < command_queue->context->num_devices; ++i) |
511 | { |
512 | - if (command_queue->context->devices[i] == device_id) |
513 | + if (command_queue->context->devices[i] == device) |
514 | break; |
515 | } |
516 | - |
517 | assert(i < command_queue->context->num_devices); |
518 | |
519 | - device_id->write_rect(device_id->data, ptr, buffer->device_ptrs[device_id->dev_id], |
520 | + |
521 | + /* execute directly */ |
522 | + /* TODO: enqueue the write_rect if this is a non-blocking read (see |
523 | + clEnqueueWriteBuffer) */ |
524 | + if (command_queue->properties & CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE) |
525 | + { |
526 | + /* wait for the event in event_wait_list to finish */ |
527 | + POCL_ABORT_UNIMPLEMENTED(); |
528 | + } |
529 | + else |
530 | + { |
531 | + /* in-order queue - all previously enqueued commands must |
532 | + * finish before this read */ |
533 | + // ensure our buffer is not freed yet |
534 | + clRetainMemObject (buffer); |
535 | + clFinish(command_queue); |
536 | + } |
537 | + POCL_PROFILE_SUBMITTED; |
538 | + POCL_PROFILE_RUNNING; |
539 | + |
540 | + /* TODO: offset computation doesn't work in case the ptr is not |
541 | + a direct pointer */ |
542 | + device->write_rect(device->data, ptr, buffer->device_ptrs[device->dev_id], |
543 | buffer_origin, host_origin, region, |
544 | buffer_row_pitch, buffer_slice_pitch, |
545 | host_row_pitch, host_slice_pitch); |
546 | |
547 | + POCL_PROFILE_COMPLETE; |
548 | + |
549 | + clReleaseMemObject (buffer); |
550 | + |
551 | return CL_SUCCESS; |
552 | } |
553 | |
554 | === modified file 'lib/CL/clFinish.c' |
555 | --- lib/CL/clFinish.c 2012-05-31 12:01:27 +0000 |
556 | +++ lib/CL/clFinish.c 2012-08-15 20:36:22 +0000 |
557 | @@ -41,6 +41,7 @@ |
558 | { |
559 | case CL_COMMAND_TYPE_READ: |
560 | POCL_PROFILE_SUBMITTED; |
561 | + POCL_PROFILE_RUNNING; |
562 | command_queue->device->read |
563 | (node->command.read.data, |
564 | node->command.read.host_ptr, |
565 | @@ -51,6 +52,7 @@ |
566 | break; |
567 | case CL_COMMAND_TYPE_WRITE: |
568 | POCL_PROFILE_SUBMITTED; |
569 | + POCL_PROFILE_RUNNING; |
570 | command_queue->device->write |
571 | (node->command.write.data, |
572 | node->command.write.host_ptr, |
573 | @@ -61,6 +63,7 @@ |
574 | break; |
575 | case CL_COMMAND_TYPE_COPY: |
576 | POCL_PROFILE_SUBMITTED; |
577 | + POCL_PROFILE_RUNNING; |
578 | command_queue->device->copy |
579 | (node->command.copy.data, |
580 | node->command.copy.src_ptr, |
581 | @@ -71,9 +74,11 @@ |
582 | clReleaseMemObject (node->command.copy.dst_buffer); |
583 | break; |
584 | case CL_COMMAND_TYPE_RUN: |
585 | - POCL_PROFILE_SUBMITTED; |
586 | assert (*event == node->event); |
587 | + POCL_PROFILE_SUBMITTED; |
588 | + POCL_PROFILE_RUNNING; |
589 | command_queue->device->run(node->command.run.data, node); |
590 | + POCL_PROFILE_COMPLETE; |
591 | for (i = 0; i < node->command.run.arg_buffer_count; ++i) |
592 | { |
593 | cl_mem buf = node->command.run.arg_buffers[i]; |
594 | |
595 | === modified file 'lib/CL/clGetDeviceInfo.c' |
596 | --- lib/CL/clGetDeviceInfo.c 2012-08-08 12:45:17 +0000 |
597 | +++ lib/CL/clGetDeviceInfo.c 2012-08-15 20:36:22 +0000 |
598 | @@ -126,23 +126,17 @@ |
599 | } |
600 | |
601 | case CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR: |
602 | - POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_char); |
603 | - |
604 | + POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_char); |
605 | case CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT: |
606 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_short); |
607 | - |
608 | case CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT: |
609 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_int); |
610 | - |
611 | case CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG: |
612 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_long); |
613 | - |
614 | case CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT: |
615 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_float); |
616 | - |
617 | case CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE: |
618 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_double); |
619 | - |
620 | case CL_DEVICE_MAX_CLOCK_FREQUENCY : |
621 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->max_clock_frequency); |
622 | case CL_DEVICE_ADDRESS_BITS : |
623 | @@ -186,7 +180,7 @@ |
624 | case CL_DEVICE_MAX_CONSTANT_ARGS : |
625 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->max_constant_args); |
626 | case CL_DEVICE_LOCAL_MEM_TYPE : |
627 | - POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->local_mem_size); |
628 | + POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->local_mem_type); |
629 | case CL_DEVICE_LOCAL_MEM_SIZE: |
630 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_ulong, device->local_mem_size); |
631 | case CL_DEVICE_ERROR_CORRECTION_SUPPORT : |
632 | @@ -218,28 +212,34 @@ |
633 | |
634 | case CL_DEVICE_EXTENSIONS : |
635 | POCL_RETURN_DEVICE_INFO_STR("cl_khr_fp16"); |
636 | - case CL_DEVICE_PLATFORM : break; |
637 | - case CL_DEVICE_DOUBLE_FP_CONFIG : break; |
638 | + case CL_DEVICE_PLATFORM : |
639 | + { |
640 | + /* Return the first platform id, assuming this is the only |
641 | + platform id (which is currently always the case for pocl) */ |
642 | + cl_platform_id platform_id; |
643 | + clGetPlatformIDs(1, &platform_id, NULL); |
644 | + POCL_RETURN_DEVICE_INFO(cl_platform_id, platform_id); |
645 | + } |
646 | + case CL_DEVICE_DOUBLE_FP_CONFIG : |
647 | + POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_ulong, device->double_fp_config); |
648 | case CL_DEVICE_HALF_FP_CONFIG : break; |
649 | - case CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF : break; |
650 | + case CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF : |
651 | + POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_half); |
652 | case CL_DEVICE_HOST_UNIFIED_MEMORY : break; |
653 | - /* TODO: figure out what the difference between preferred and native |
654 | - widths are. And why there is no struct fields 'native_vector...' */ |
655 | case CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR : |
656 | - POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_char); |
657 | + POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->native_vector_width_char); |
658 | case CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT : |
659 | - POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_short); |
660 | + POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->native_vector_width_short); |
661 | case CL_DEVICE_NATIVE_VECTOR_WIDTH_INT : |
662 | - POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_int); |
663 | + POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->native_vector_width_int); |
664 | case CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG : |
665 | - POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_long); |
666 | + POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->native_vector_width_long); |
667 | case CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT : |
668 | - POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_float); |
669 | + POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->native_vector_width_float); |
670 | case CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE : |
671 | - POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_double); |
672 | + POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->native_vector_width_double); |
673 | case CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF : |
674 | - /* TODO: why is there no preferred_vector_width_half? */ |
675 | - POCL_RETURN_DEVICE_INFO(cl_uint, 0); |
676 | + POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->native_vector_width_half); |
677 | case CL_DEVICE_OPENCL_C_VERSION : |
678 | POCL_RETURN_DEVICE_INFO_STR("1.2"); |
679 | } |
680 | |
681 | === modified file 'lib/CL/clGetKernelWorkGroupInfo.c' |
682 | --- lib/CL/clGetKernelWorkGroupInfo.c 2012-03-28 11:25:45 +0000 |
683 | +++ lib/CL/clGetKernelWorkGroupInfo.c 2012-08-15 20:36:22 +0000 |
684 | @@ -65,7 +65,7 @@ |
685 | POCL_ABORT_UNIMPLEMENTED(); |
686 | |
687 | case CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE: |
688 | - POCL_RETURN_KERNEL_WG_INFO(size_t, device->preferred_wg_size_multiple); |
689 | + POCL_RETURN_KERNEL_WG_INFO(size_t, device->preferred_wg_size_multiple); |
690 | |
691 | case CL_KERNEL_LOCAL_MEM_SIZE: |
692 | POCL_RETURN_KERNEL_WG_INFO(cl_ulong, device->local_mem_size); |
693 | |
694 | === modified file 'lib/CL/clReleaseProgram.c' |
695 | --- lib/CL/clReleaseProgram.c 2012-05-31 12:01:27 +0000 |
696 | +++ lib/CL/clReleaseProgram.c 2012-08-15 20:36:22 +0000 |
697 | @@ -58,7 +58,7 @@ |
698 | free (program->binary_sizes); |
699 | |
700 | env = getenv ("POCL_LEAVE_TEMP_DIRS"); |
701 | - if (!(env != NULL && strlen (env) == 1 && env[0] == '1') && |
702 | + if (!(env != NULL && strcmp (env, "1") == 0) && |
703 | getenv("POCL_TEMP_DIR") == NULL) |
704 | { |
705 | remove_directory (program->temp_dir); |
706 | |
707 | === modified file 'lib/CL/devices/basic/basic.h' |
708 | --- lib/CL/devices/basic/basic.h 2012-08-08 14:54:32 +0000 |
709 | +++ lib/CL/devices/basic/basic.h 2012-08-15 20:36:22 +0000 |
710 | @@ -46,6 +46,15 @@ |
711 | POCL_DEVICES_PREFERRED_VECTOR_WIDTH_LONG , /* preferred_vector_width_long */ \ |
712 | POCL_DEVICES_PREFERRED_VECTOR_WIDTH_FLOAT , /* preferred_vector_width_float */ \ |
713 | POCL_DEVICES_PREFERRED_VECTOR_WIDTH_DOUBLE, /* preferred_vector_width_double */ \ |
714 | + POCL_DEVICES_PREFERRED_VECTOR_WIDTH_HALF , /* preferred_vector_width_half */ \ |
715 | + /* TODO: figure out what the difference between preferred and native widths are. */ \ |
716 | + POCL_DEVICES_PREFERRED_VECTOR_WIDTH_CHAR , /* preferred_vector_width_char */ \ |
717 | + POCL_DEVICES_PREFERRED_VECTOR_WIDTH_SHORT , /* preferred_vector_width_short */ \ |
718 | + POCL_DEVICES_PREFERRED_VECTOR_WIDTH_INT , /* preferred_vector_width_int */ \ |
719 | + POCL_DEVICES_PREFERRED_VECTOR_WIDTH_LONG , /* preferred_vector_width_long */ \ |
720 | + POCL_DEVICES_PREFERRED_VECTOR_WIDTH_FLOAT , /* preferred_vector_width_float */ \ |
721 | + POCL_DEVICES_PREFERRED_VECTOR_WIDTH_DOUBLE, /* preferred_vector_width_double */ \ |
722 | + POCL_DEVICES_PREFERRED_VECTOR_WIDTH_HALF , /* preferred_vector_width_half */ \ |
723 | 0, /* max_clock_frequency */ \ |
724 | 0, /* address_bits */ \ |
725 | 0, /* max_mem_alloc_size */ \ |
726 | @@ -62,6 +71,7 @@ |
727 | 0, /* mem_base_addr_align */ \ |
728 | 0, /* min_data_type_align_size */ \ |
729 | CL_FP_ROUND_TO_NEAREST | CL_FP_INF_NAN, /* single_fp_config */ \ |
730 | + CL_FP_ROUND_TO_NEAREST | CL_FP_INF_NAN, /* double_fp_config */ \ |
731 | CL_NONE, /* global_mem_cache_type */ \ |
732 | 0, /* global_mem_cacheline_size */ \ |
733 | 0, /* global_mem_cache_size */ \ |
734 | |
735 | === modified file 'lib/CL/devices/common.h' |
736 | --- lib/CL/devices/common.h 2012-04-24 13:12:25 +0000 |
737 | +++ lib/CL/devices/common.h 2012-08-15 20:36:22 +0000 |
738 | @@ -70,5 +70,6 @@ |
739 | #endif |
740 | /* Half is internally represented as short */ |
741 | #define POCL_DEVICES_PREFERRED_VECTOR_WIDTH_HALF POCL_DEVICES_PREFERRED_VECTOR_WIDTH_SHORT |
742 | +#define POCL_DEVICES_NATIVE_VECTOR_WIDTH_HALF POCL_DEVICES_NATIVE_VECTOR_WIDTH_SHORT |
743 | |
744 | #endif |
745 | |
746 | === modified file 'lib/CL/devices/devices.c' |
747 | --- lib/CL/devices/devices.c 2012-05-31 12:01:27 +0000 |
748 | +++ lib/CL/devices/devices.c 2012-08-15 20:36:22 +0000 |
749 | @@ -57,7 +57,7 @@ |
750 | pocl_init_devices() |
751 | { |
752 | const char *device_list; |
753 | - char *ptr, *tofree, *token, *saveptr, *saveptr2; |
754 | + char *ptr, *tofree, *token, *saveptr; |
755 | int i, devcount; |
756 | if (pocl_num_devices > 0) |
757 | return; |
758 | @@ -83,7 +83,7 @@ |
759 | |
760 | ptr = tofree = strdup(device_list); |
761 | devcount = 0; |
762 | - while ((token = strtok_r (ptr, " ", &saveptr2)) != NULL) |
763 | + while ((token = strtok_r (ptr, " ", &saveptr)) != NULL) |
764 | { |
765 | struct _cl_device_id* device_type = NULL; |
766 | |
767 | |
768 | === modified file 'lib/CL/devices/pthread/pocl-pthread.h' |
769 | --- lib/CL/devices/pthread/pocl-pthread.h 2012-08-08 14:54:32 +0000 |
770 | +++ lib/CL/devices/pthread/pocl-pthread.h 2012-08-15 20:36:22 +0000 |
771 | @@ -50,6 +50,15 @@ |
772 | POCL_DEVICES_PREFERRED_VECTOR_WIDTH_LONG , /* preferred_vector_width_long */ \ |
773 | POCL_DEVICES_PREFERRED_VECTOR_WIDTH_FLOAT , /* preferred_vector_width_float */ \ |
774 | POCL_DEVICES_PREFERRED_VECTOR_WIDTH_DOUBLE, /* preferred_vector_width_double */ \ |
775 | + POCL_DEVICES_PREFERRED_VECTOR_WIDTH_HALF, /* preferred_vector_width_half */ \ |
776 | + /* TODO: figure out what the difference between preferred and native widths are. */ \ |
777 | + POCL_DEVICES_PREFERRED_VECTOR_WIDTH_CHAR , /* preferred_vector_width_char */ \ |
778 | + POCL_DEVICES_PREFERRED_VECTOR_WIDTH_SHORT , /* preferred_vector_width_short */ \ |
779 | + POCL_DEVICES_PREFERRED_VECTOR_WIDTH_INT , /* preferred_vector_width_int */ \ |
780 | + POCL_DEVICES_PREFERRED_VECTOR_WIDTH_LONG , /* preferred_vector_width_long */ \ |
781 | + POCL_DEVICES_PREFERRED_VECTOR_WIDTH_FLOAT , /* preferred_vector_width_float */ \ |
782 | + POCL_DEVICES_PREFERRED_VECTOR_WIDTH_DOUBLE, /* preferred_vector_width_double */ \ |
783 | + POCL_DEVICES_PREFERRED_VECTOR_WIDTH_HALF , /* preferred_vector_width_half */ \ |
784 | 0, /* max_clock_frequency */ \ |
785 | 0, /* address_bits */ \ |
786 | 0, /* max_mem_alloc_size */ \ |
787 | @@ -66,6 +75,7 @@ |
788 | 0, /* mem_base_addr_align */ \ |
789 | 0, /* min_data_type_align_size */ \ |
790 | CL_FP_ROUND_TO_NEAREST | CL_FP_INF_NAN, /* single_fp_config */ \ |
791 | + CL_FP_ROUND_TO_NEAREST | CL_FP_INF_NAN, /* double_fp_config */ \ |
792 | CL_NONE, /* global_mem_cache_type */ \ |
793 | 0, /* global_mem_cacheline_size */ \ |
794 | 0, /* global_mem_cache_size */ \ |
795 | |
796 | === modified file 'lib/CL/pocl_cl.h' |
797 | --- lib/CL/pocl_cl.h 2012-08-08 14:54:32 +0000 |
798 | +++ lib/CL/pocl_cl.h 2012-08-15 20:36:22 +0000 |
799 | @@ -25,6 +25,7 @@ |
800 | #define POCL_CL_H |
801 | |
802 | #include "config.h" |
803 | +#include <assert.h> |
804 | #include <stdio.h> |
805 | #include <ltdl.h> |
806 | #include <pthread.h> |
807 | @@ -150,6 +151,14 @@ |
808 | cl_uint preferred_vector_width_long; |
809 | cl_uint preferred_vector_width_float; |
810 | cl_uint preferred_vector_width_double; |
811 | + cl_uint preferred_vector_width_half; |
812 | + cl_uint native_vector_width_char; |
813 | + cl_uint native_vector_width_short; |
814 | + cl_uint native_vector_width_int; |
815 | + cl_uint native_vector_width_long; |
816 | + cl_uint native_vector_width_float; |
817 | + cl_uint native_vector_width_double; |
818 | + cl_uint native_vector_width_half; |
819 | cl_uint max_clock_frequency; |
820 | cl_uint address_bits; |
821 | cl_ulong max_mem_alloc_size; |
822 | @@ -166,6 +175,7 @@ |
823 | cl_uint mem_base_addr_align; |
824 | cl_uint min_data_type_align_size; |
825 | cl_device_fp_config single_fp_config; |
826 | + cl_device_fp_config double_fp_config; |
827 | cl_device_mem_cache_type global_mem_cache_type; |
828 | cl_uint global_mem_cacheline_size; |
829 | cl_ulong global_mem_cache_size; |
830 | @@ -394,12 +404,12 @@ |
831 | } \ |
832 | } while (0) \ |
833 | |
834 | - |
835 | #define POCL_PROFILE_SUBMITTED \ |
836 | do { \ |
837 | if (command_queue->properties & CL_QUEUE_PROFILING_ENABLE && \ |
838 | event != NULL && (*event) != NULL) \ |
839 | { \ |
840 | + assert((*event)->status = CL_QUEUED); \ |
841 | (*event)->status = CL_SUBMITTED; \ |
842 | (*event)->time_submit = \ |
843 | command_queue->device->get_timer_value(command_queue->device->data); \ |
844 | @@ -411,6 +421,7 @@ |
845 | if (command_queue->properties & CL_QUEUE_PROFILING_ENABLE && \ |
846 | event != NULL && (*event) != NULL) \ |
847 | { \ |
848 | + assert((*event)->status = CL_SUBMITTED); \ |
849 | (*event)->status = CL_RUNNING; \ |
850 | (*event)->time_start = \ |
851 | command_queue->device->get_timer_value(command_queue->device->data); \ |
852 | @@ -422,6 +433,7 @@ |
853 | if (command_queue->properties & CL_QUEUE_PROFILING_ENABLE && \ |
854 | event != NULL && (*event) != NULL) \ |
855 | { \ |
856 | + assert((*event)->status = CL_RUNNING); \ |
857 | (*event)->status = CL_COMPLETE; \ |
858 | (*event)->time_end = \ |
859 | command_queue->device->get_timer_value(command_queue->device->data); \ |
860 | |
861 | === added file 'lib/kernel/atomics.cl' |
862 | --- lib/kernel/atomics.cl 1970-01-01 00:00:00 +0000 |
863 | +++ lib/kernel/atomics.cl 2012-08-15 20:36:22 +0000 |
864 | @@ -0,0 +1,149 @@ |
865 | +/* OpenCL built-in library: atomic operations |
866 | + |
867 | + Copyright (c) 2012 Universidad Rey Juan Carlos |
868 | + |
869 | + Permission is hereby granted, free of charge, to any person obtaining a copy |
870 | + of this software and associated documentation files (the "Software"), to deal |
871 | + in the Software without restriction, including without limitation the rights |
872 | + to use, copy, modify, merge, publish, distribute, sublicense, and/or sell |
873 | + copies of the Software, and to permit persons to whom the Software is |
874 | + furnished to do so, subject to the following conditions: |
875 | + |
876 | + The above copyright notice and this permission notice shall be included in |
877 | + all copies or substantial portions of the Software. |
878 | + |
879 | + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR |
880 | + IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, |
881 | + FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE |
882 | + AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER |
883 | + LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, |
884 | + OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN |
885 | + THE SOFTWARE. |
886 | +*/ |
887 | + |
888 | + |
889 | + |
890 | +// Repeat the content of this file several times with different values |
891 | +// for Q, T, and U: |
892 | +#if !defined(Q) |
893 | + |
894 | +# define Q __global |
895 | +# include "atomics.cl" |
896 | +# undef Q |
897 | + |
898 | +# define Q __local |
899 | +# include "atomics.cl" |
900 | +# undef Q |
901 | + |
902 | +#elif !defined(T) |
903 | + |
904 | +# define T int |
905 | +# define MIN __sync_fetch_and_min |
906 | +# define MAX __sync_fetch_and_max |
907 | +# include "atomics.cl" |
908 | +# undef T |
909 | +# undef MIN |
910 | +# undef MAX |
911 | + |
912 | +# define T uint |
913 | +# define MIN __sync_fetch_and_umin |
914 | +# define MAX __sync_fetch_and_umax |
915 | +# include "atomics.cl" |
916 | +# undef T |
917 | +# undef MIN |
918 | +# undef MAX |
919 | + |
920 | + |
921 | +// xchg is also supported for float as a special case |
922 | +__attribute__((overloadable)) |
923 | +float atomic_xchg(volatile Q float *p, float val) |
924 | +{ |
925 | + // NOTE: We compare the float as int here... |
926 | + return __atomic_exchange_n((volatile int*)p, val, __ATOMIC_RELAXED); |
927 | +} |
928 | + |
929 | +#else |
930 | + |
931 | + |
932 | + |
933 | +// basic |
934 | + |
935 | +// read, add, store |
936 | +__attribute__((overloadable)) |
937 | +T atomic_add(volatile Q T *p, T val) |
938 | +{ |
939 | + return __sync_fetch_and_add((volatile T*)p, val, __ATOMIC_RELAXED); |
940 | +} |
941 | + |
942 | +// read, subtract, store |
943 | +__attribute__((overloadable)) |
944 | +T atomic_sub(volatile Q T *p, T val) |
945 | +{ |
946 | + return __sync_fetch_and_sub(p, val, __ATOMIC_RELAXED); |
947 | +} |
948 | + |
949 | +// read, swap, store |
950 | +__attribute__((overloadable)) |
951 | +T atomic_xchg(volatile Q T *p, T val) |
952 | +{ |
953 | + return __atomic_exchange_n(p, val, __ATOMIC_RELAXED); |
954 | +} |
955 | + |
956 | +// read, increment, store |
957 | +__attribute__((overloadable)) |
958 | +T atomic_inc(volatile Q T *p) |
959 | +{ |
960 | + return atomic_add(p, (T)1); |
961 | +} |
962 | + |
963 | +// read, decrement, store |
964 | +__attribute__((overloadable)) |
965 | +T atomic_dec(volatile Q T *p) |
966 | +{ |
967 | + return atomic_sub(p, (T)1); |
968 | +} |
969 | + |
970 | +// read, store |
971 | +__attribute__((overloadable)) |
972 | +T atomic_cmpxchg(volatile Q T *p, T cmp, T val) |
973 | +{ |
974 | + __atomic_compare_exchange_n(p, &cmp, val, false, |
975 | + __ATOMIC_RELAXED, __ATOMIC_RELAXED); |
976 | + return cmp; |
977 | +} |
978 | + |
979 | +// extended |
980 | + |
981 | +__attribute__((overloadable)) |
982 | +T atomic_min(volatile Q T *p, T val) |
983 | +{ |
984 | + return MIN((volatile T*)p, val); |
985 | +} |
986 | + |
987 | +__attribute__((overloadable)) |
988 | +T atomic_max(volatile Q T *p, T val) |
989 | +{ |
990 | + return MAX((volatile T*)p, val); |
991 | +} |
992 | + |
993 | +__attribute__((overloadable)) |
994 | +T atomic_and(volatile Q T *p, T val) |
995 | +{ |
996 | + return __sync_fetch_and_and(p, val, __ATOMIC_RELAXED); |
997 | +} |
998 | + |
999 | +__attribute__((overloadable)) |
1000 | +T atomic_or(volatile Q T *p, T val) |
1001 | +{ |
1002 | + return __sync_fetch_and_or(p, val, __ATOMIC_RELAXED); |
1003 | +} |
1004 | + |
1005 | +__attribute__((overloadable)) |
1006 | +T atomic_xor(volatile Q T *p, T val) |
1007 | +{ |
1008 | + return __sync_fetch_and_xor(p, val, __ATOMIC_RELAXED); |
1009 | +} |
1010 | + |
1011 | + |
1012 | + |
1013 | +#endif |
1014 | |
1015 | === modified file 'lib/kernel/hadd.cl' |
1016 | --- lib/kernel/hadd.cl 2011-10-26 13:03:37 +0000 |
1017 | +++ lib/kernel/hadd.cl 2012-08-15 20:36:22 +0000 |
1018 | @@ -23,7 +23,5 @@ |
1019 | |
1020 | #include "templates.h" |
1021 | |
1022 | -// This could do with some testing |
1023 | -// This could probably also be optimised (i.e. the ?: operators eliminated) |
1024 | DEFINE_EXPR_G_GG(hadd, |
1025 | (a >> (sgtype)1) + (b >> (sgtype)1) + (a & b & (gtype)1)) |
1026 | |
1027 | === modified file 'lib/kernel/rhadd.cl' |
1028 | --- lib/kernel/rhadd.cl 2011-10-26 13:03:37 +0000 |
1029 | +++ lib/kernel/rhadd.cl 2012-08-15 20:36:22 +0000 |
1030 | @@ -23,7 +23,5 @@ |
1031 | |
1032 | #include "templates.h" |
1033 | |
1034 | -// This could do with some testing |
1035 | -// This could probably also be optimised (i.e. the ?: operators eliminated) |
1036 | DEFINE_EXPR_G_GG(rhadd, |
1037 | (a >> (sgtype)1) + (b >> (sgtype)1) + ((a | b) & (gtype)1)) |
1038 | |
1039 | === modified file 'lib/kernel/sources.mk' |
1040 | --- lib/kernel/sources.mk 2012-06-04 12:15:18 +0000 |
1041 | +++ lib/kernel/sources.mk 2012-08-15 20:36:22 +0000 |
1042 | @@ -1,144 +1,146 @@ |
1043 | # Nodist here because these files should be included |
1044 | # to the distribution only once, from the root kernel |
1045 | # makefile. |
1046 | -nodist_libkernel_a_SOURCES = templates.h \ |
1047 | - barrier.ll \ |
1048 | - image.h \ |
1049 | - get_work_dim.c \ |
1050 | - get_global_size.c \ |
1051 | - get_global_id.c \ |
1052 | - get_local_size.c \ |
1053 | - get_local_id.c \ |
1054 | - get_num_groups.c \ |
1055 | - get_group_id.c \ |
1056 | - get_global_offset.c \ |
1057 | - as_type.cl \ |
1058 | - convert_type.cl \ |
1059 | - acos.cl \ |
1060 | - acosh.cl \ |
1061 | - acospi.cl \ |
1062 | - asin.cl \ |
1063 | - asinh.cl \ |
1064 | - asinpi.cl \ |
1065 | - atan.cl \ |
1066 | - atan2.cl \ |
1067 | - atan2pi.cl \ |
1068 | - atanh.cl \ |
1069 | - atanpi.cl \ |
1070 | - cbrt.cl \ |
1071 | - ceil.cl \ |
1072 | - copysign.cl \ |
1073 | - cos.cl \ |
1074 | - cosh.cl \ |
1075 | - cospi.cl \ |
1076 | - erfc.cl \ |
1077 | - erf.cl \ |
1078 | - exp.cl \ |
1079 | - exp2.cl \ |
1080 | - exp10.cl \ |
1081 | - expm1.cl \ |
1082 | - fabs.cl \ |
1083 | - fdim.cl \ |
1084 | - floor.cl \ |
1085 | - fma.cl \ |
1086 | - fmax.cl \ |
1087 | - fmin.cl \ |
1088 | - fmod.cl \ |
1089 | - fract.cl \ |
1090 | - hypot.cl \ |
1091 | - ilogb.cl \ |
1092 | - ldexp.cl \ |
1093 | - lgamma.cl \ |
1094 | - log.cl \ |
1095 | - log2.cl \ |
1096 | - log10.cl \ |
1097 | - log1p.cl \ |
1098 | - logb.cl \ |
1099 | - mad.cl \ |
1100 | - maxmag.cl \ |
1101 | - minmag.cl \ |
1102 | - nan.cl \ |
1103 | - nextafter.cl \ |
1104 | - pow.cl \ |
1105 | - pown.cl \ |
1106 | - powr.cl \ |
1107 | - remainder.cl \ |
1108 | - rint.cl \ |
1109 | - rootn.cl \ |
1110 | - round.cl \ |
1111 | - rsqrt.cl \ |
1112 | - sin.cl \ |
1113 | - sincos.cl \ |
1114 | - sinh.cl \ |
1115 | - sinpi.cl \ |
1116 | - sqrt.cl \ |
1117 | - tan.cl \ |
1118 | - tanh.cl \ |
1119 | - tanpi.cl \ |
1120 | - tgamma.cl \ |
1121 | - trunc.cl \ |
1122 | - divide.cl \ |
1123 | - recip.cl \ |
1124 | - abs.cl \ |
1125 | - abs_diff.cl \ |
1126 | - add_sat.cl \ |
1127 | - hadd.cl \ |
1128 | - rhadd.cl \ |
1129 | - clamp.cl \ |
1130 | - clz.cl \ |
1131 | - mad_hi.cl \ |
1132 | - mad_sat.cl \ |
1133 | - max.cl \ |
1134 | - min.cl \ |
1135 | - mul_hi.cl \ |
1136 | - rotate.cl \ |
1137 | - sub_sat.cl \ |
1138 | - upsample.cl \ |
1139 | - popcount.cl \ |
1140 | - mad24.cl \ |
1141 | - mul24.cl \ |
1142 | - degrees.cl \ |
1143 | - mix.cl \ |
1144 | - radians.cl \ |
1145 | - step.cl \ |
1146 | - smoothstep.cl \ |
1147 | - sign.cl \ |
1148 | - cross.cl \ |
1149 | - dot.cl \ |
1150 | - distance.cl \ |
1151 | - length.cl \ |
1152 | - normalize.cl \ |
1153 | - fast_distance.cl \ |
1154 | - fast_length.cl \ |
1155 | - fast_normalize.cl \ |
1156 | - isequal.cl \ |
1157 | - isnotequal.cl \ |
1158 | - isgreater.cl \ |
1159 | - isgreaterequal.cl \ |
1160 | - isless.cl \ |
1161 | - islessequal.cl \ |
1162 | - islessgreater.cl \ |
1163 | - isfinite.cl \ |
1164 | - isinf.cl \ |
1165 | - isnan.cl \ |
1166 | - isnormal.cl \ |
1167 | - isordered.cl \ |
1168 | - isunordered.cl \ |
1169 | - signbit.cl \ |
1170 | - any.cl \ |
1171 | - all.cl \ |
1172 | - bitselect.cl \ |
1173 | - select.cl \ |
1174 | - vload.cl \ |
1175 | - vstore.cl \ |
1176 | - vload_half.cl \ |
1177 | - vstore_half.cl \ |
1178 | - async_work_group_copy.cl \ |
1179 | - wait_group_events.cl \ |
1180 | - read_image.cl \ |
1181 | - write_image.cl \ |
1182 | - get_image_width.cl \ |
1183 | - get_image_height.cl |
1184 | +nodist_libkernel_a_SOURCES = \ |
1185 | + templates.h \ |
1186 | + barrier.ll \ |
1187 | + image.h \ |
1188 | + get_work_dim.c \ |
1189 | + get_global_size.c \ |
1190 | + get_global_id.c \ |
1191 | + get_local_size.c \ |
1192 | + get_local_id.c \ |
1193 | + get_num_groups.c \ |
1194 | + get_group_id.c \ |
1195 | + get_global_offset.c \ |
1196 | + as_type.cl \ |
1197 | + atomics.cl \ |
1198 | + convert_type.cl \ |
1199 | + acos.cl \ |
1200 | + acosh.cl \ |
1201 | + acospi.cl \ |
1202 | + asin.cl \ |
1203 | + asinh.cl \ |
1204 | + asinpi.cl \ |
1205 | + atan.cl \ |
1206 | + atan2.cl \ |
1207 | + atan2pi.cl \ |
1208 | + atanh.cl \ |
1209 | + atanpi.cl \ |
1210 | + cbrt.cl \ |
1211 | + ceil.cl \ |
1212 | + copysign.cl \ |
1213 | + cos.cl \ |
1214 | + cosh.cl \ |
1215 | + cospi.cl \ |
1216 | + erfc.cl \ |
1217 | + erf.cl \ |
1218 | + exp.cl \ |
1219 | + exp2.cl \ |
1220 | + exp10.cl \ |
1221 | + expm1.cl \ |
1222 | + fabs.cl \ |
1223 | + fdim.cl \ |
1224 | + floor.cl \ |
1225 | + fma.cl \ |
1226 | + fmax.cl \ |
1227 | + fmin.cl \ |
1228 | + fmod.cl \ |
1229 | + fract.cl \ |
1230 | + hypot.cl \ |
1231 | + ilogb.cl \ |
1232 | + ldexp.cl \ |
1233 | + lgamma.cl \ |
1234 | + log.cl \ |
1235 | + log2.cl \ |
1236 | + log10.cl \ |
1237 | + log1p.cl \ |
1238 | + logb.cl \ |
1239 | + mad.cl \ |
1240 | + maxmag.cl \ |
1241 | + minmag.cl \ |
1242 | + nan.cl \ |
1243 | + nextafter.cl \ |
1244 | + pow.cl \ |
1245 | + pown.cl \ |
1246 | + powr.cl \ |
1247 | + remainder.cl \ |
1248 | + rint.cl \ |
1249 | + rootn.cl \ |
1250 | + round.cl \ |
1251 | + rsqrt.cl \ |
1252 | + sin.cl \ |
1253 | + sincos.cl \ |
1254 | + sinh.cl \ |
1255 | + sinpi.cl \ |
1256 | + sqrt.cl \ |
1257 | + tan.cl \ |
1258 | + tanh.cl \ |
1259 | + tanpi.cl \ |
1260 | + tgamma.cl \ |
1261 | + trunc.cl \ |
1262 | + divide.cl \ |
1263 | + recip.cl \ |
1264 | + abs.cl \ |
1265 | + abs_diff.cl \ |
1266 | + add_sat.cl \ |
1267 | + hadd.cl \ |
1268 | + rhadd.cl \ |
1269 | + clamp.cl \ |
1270 | + clz.cl \ |
1271 | + mad_hi.cl \ |
1272 | + mad_sat.cl \ |
1273 | + max.cl \ |
1274 | + min.cl \ |
1275 | + mul_hi.cl \ |
1276 | + rotate.cl \ |
1277 | + sub_sat.cl \ |
1278 | + upsample.cl \ |
1279 | + popcount.cl \ |
1280 | + mad24.cl \ |
1281 | + mul24.cl \ |
1282 | + degrees.cl \ |
1283 | + mix.cl \ |
1284 | + radians.cl \ |
1285 | + step.cl \ |
1286 | + smoothstep.cl \ |
1287 | + sign.cl \ |
1288 | + cross.cl \ |
1289 | + dot.cl \ |
1290 | + distance.cl \ |
1291 | + length.cl \ |
1292 | + normalize.cl \ |
1293 | + fast_distance.cl \ |
1294 | + fast_length.cl \ |
1295 | + fast_normalize.cl \ |
1296 | + isequal.cl \ |
1297 | + isnotequal.cl \ |
1298 | + isgreater.cl \ |
1299 | + isgreaterequal.cl \ |
1300 | + isless.cl \ |
1301 | + islessequal.cl \ |
1302 | + islessgreater.cl \ |
1303 | + isfinite.cl \ |
1304 | + isinf.cl \ |
1305 | + isnan.cl \ |
1306 | + isnormal.cl \ |
1307 | + isordered.cl \ |
1308 | + isunordered.cl \ |
1309 | + signbit.cl \ |
1310 | + any.cl \ |
1311 | + all.cl \ |
1312 | + bitselect.cl \ |
1313 | + select.cl \ |
1314 | + vload.cl \ |
1315 | + vstore.cl \ |
1316 | + vload_half.cl \ |
1317 | + vstore_half.cl \ |
1318 | + async_work_group_copy.cl \ |
1319 | + wait_group_events.cl \ |
1320 | + read_image.cl \ |
1321 | + write_image.cl \ |
1322 | + get_image_width.cl \ |
1323 | + get_image_height.cl |
1324 | |
1325 | barrier.o: barrier.ll |
1326 | $(LLVM_AS) -o $@ $< |
1327 | |
1328 | === modified file 'lib/kernel/sub_sat.cl' |
1329 | --- lib/kernel/sub_sat.cl 2011-10-26 19:49:23 +0000 |
1330 | +++ lib/kernel/sub_sat.cl 2012-08-15 20:36:22 +0000 |
1331 | @@ -30,8 +30,6 @@ |
1332 | // ushort __builtin_ia32_psubusw128 |
1333 | // Other types don't seem to be supported. |
1334 | |
1335 | -// This could do with some testing |
1336 | -// This could probably also be optimised (i.e. the ?: operators eliminated) |
1337 | DEFINE_EXPR_G_GG(sub_sat, |
1338 | (sgtype)-1 < (sgtype)0 ? |
1339 | /* signed */ |
1340 | |
1341 | === modified file 'lib/llvmopencl/Makefile.am' |
1342 | --- lib/llvmopencl/Makefile.am 2012-06-15 12:26:40 +0000 |
1343 | +++ lib/llvmopencl/Makefile.am 2012-08-15 20:36:22 +0000 |
1344 | @@ -40,4 +40,4 @@ |
1345 | ImplicitLoopBarriers.h ImplicitLoopBarriers.cc \ |
1346 | WorkItemAliasAnalysis.cc WIVectorize.cc |
1347 | |
1348 | -#llvmopencl_la_LIBADD = @LIBS_LLVMTRANSFORMUTILS@ |
1349 | \ No newline at end of file |
1350 | +#llvmopencl_la_LIBADD = @LIBS_LLVMTRANSFORMUTILS@ |
1351 | |
1352 | === modified file 'tests/regression/test_constant_array.cpp' |
1353 | --- tests/regression/test_constant_array.cpp 2012-08-02 14:18:02 +0000 |
1354 | +++ tests/regression/test_constant_array.cpp 2012-08-15 20:36:22 +0000 |
1355 | @@ -29,6 +29,7 @@ |
1356 | #include <cstdio> |
1357 | #include <cstdlib> |
1358 | #include <iostream> |
1359 | +#include <unistd.h> |
1360 | |
1361 | #define WORK_ITEMS 1 |
1362 | |
1363 | |
1364 | === modified file 'tests/regression/test_infinite_loop.cpp' |
1365 | --- tests/regression/test_infinite_loop.cpp 2012-08-02 14:18:02 +0000 |
1366 | +++ tests/regression/test_infinite_loop.cpp 2012-08-15 20:36:22 +0000 |
1367 | @@ -28,6 +28,7 @@ |
1368 | #include <cstdio> |
1369 | #include <cstdlib> |
1370 | #include <iostream> |
1371 | +#include <unistd.h> |
1372 | |
1373 | #define WORK_ITEMS 1 |
1374 |
Fixes should be committed to the pocl/0.6 branch first and then 0.6 merged to trunk so we get fixes to the 0.6 release. Otherwise, looks good to me. Please commit to the lp:pocl/0.6 branch directly (which we can merge to trunk later). I'm probably going to roll a yet another 0.6 release candidate package later this week.