Merge lp:~schnetter/pocl/pocl into lp:~pocl/pocl/trunk

Proposed by Erik Schnetter
Status: Merged
Merged at revision: 335
Proposed branch: lp:~schnetter/pocl/pocl
Merge into: lp:~pocl/pocl/trunk
Diff against target: 1371 lines (+573/-231)
27 files modified
INSTALL (+5/-1)
include/_kernel.h (+35/-0)
lib/CL/clEnqueueCopyBuffer.c (+18/-2)
lib/CL/clEnqueueCopyBufferRect.c (+44/-1)
lib/CL/clEnqueueNDRangeKernel.c (+2/-0)
lib/CL/clEnqueueReadBuffer.c (+15/-5)
lib/CL/clEnqueueReadBufferRect.c (+52/-12)
lib/CL/clEnqueueUnmapMemObject.c (+1/-1)
lib/CL/clEnqueueWriteBuffer.c (+12/-17)
lib/CL/clEnqueueWriteBufferRect.c (+32/-20)
lib/CL/clFinish.c (+6/-1)
lib/CL/clGetDeviceInfo.c (+21/-21)
lib/CL/clGetKernelWorkGroupInfo.c (+1/-1)
lib/CL/clReleaseProgram.c (+1/-1)
lib/CL/devices/basic/basic.h (+10/-0)
lib/CL/devices/common.h (+1/-0)
lib/CL/devices/devices.c (+2/-2)
lib/CL/devices/pthread/pocl-pthread.h (+10/-0)
lib/CL/pocl_cl.h (+13/-1)
lib/kernel/atomics.cl (+149/-0)
lib/kernel/hadd.cl (+0/-2)
lib/kernel/rhadd.cl (+0/-2)
lib/kernel/sources.mk (+140/-138)
lib/kernel/sub_sat.cl (+0/-2)
lib/llvmopencl/Makefile.am (+1/-1)
tests/regression/test_constant_array.cpp (+1/-0)
tests/regression/test_infinite_loop.cpp (+1/-0)
To merge this branch: bzr merge lp:~schnetter/pocl/pocl
Reviewer Review Type Date Requested Status
pocl maintaners Pending
Review via email: mp+119393@code.launchpad.net

Description of the change

I corrected several build errors (and some cosmetic issues) for Mac OS X.

To post a comment you must log in.
Revision history for this message
Pekka Jääskeläinen (pekka-jaaskelainen) wrote :

Fixes should be committed to the pocl/0.6 branch first and then 0.6 merged to trunk so we get fixes to the 0.6 release. Otherwise, looks good to me. Please commit to the lp:pocl/0.6 branch directly (which we can merge to trunk later). I'm probably going to roll a yet another 0.6 release candidate package later this week.

Revision history for this message
Erik Schnetter (schnetter) wrote :

I'm still relatively unfamiliar with bzr. How exactly would I "commit directly to 0.6"? I assume I check out the 0.6 branch first, and then what? Would "bzr merge -i" do the trick that allows me to select the fixes?

Should I commit my cosmetic changes as well, or should I commit those to the trunk only?

Revision history for this message
Pekka Jääskeläinen (pekka-jaaskelainen) wrote :

Yes, check out 0.6 (e.g. 'bzr co lp:pocl/0.6 pocl-0.6') then do a cherry pick merge from your branch (bzr merge -c revno ../yourbranchlocation or bzr merge -rFIRSTREV..LASTREV ../yourbranchlocation or similar) then commit (if a bound branch, just bzr commit in the 0.6 tree). Though, cherry picking book keeping is not the best in Bazaar, and IIRC the '-c' that selects a single rev is the most robust in "backporting" revisions. I cannot recall the exact problems involved but I've bumped into them in the past.

IMO code base cleanups (cosmetic) are OK to commit to 0.6 as long as we do not introduce instability/regressions.

Revision history for this message
Erik Schnetter (schnetter) wrote :

I cherry-picked the changes into a local checkout of the 0.6 branch.
However, it appears as if I only received the changes, not the
commits? I now have a large number of "locally modified" files... Is
this to be expected? How should I handle this?

-erik

On Mon, Aug 13, 2012 at 11:33 PM, Pekka Jääskeläinen
<email address hidden> wrote:
> Yes, check out 0.6 (e.g. 'bzr co lp:pocl/0.6 pocl-0.6') then do a cherry pick merge from your branch (bzr merge -c revno ../yourbranchlocation or bzr merge -rFIRSTREV..LASTREV ../yourbranchlocation or similar) then commit (if a bound branch, just bzr commit in the 0.6 tree). Though, cherry picking book keeping is not the best in Bazaar, and IIRC the '-c' that selects a single rev is the most robust in "backporting" revisions. I cannot recall the exact problems involved but I've bumped into them in the past.
>
> IMO code base cleanups (cosmetic) are OK to commit to 0.6 as long as we do not introduce instability/regressions.
> --
> https://code.launchpad.net/~schnetter/pocl/pocl/+merge/119393
> You are the owner of lp:~schnetter/pocl/pocl.

--
Erik Schnetter <email address hidden>
http://www.perimeterinstitute.ca/personal/eschnetter/

Revision history for this message
Pekka Jääskeläinen (pekka-jaaskelainen) wrote :

That's normal. The "cherry picking" of Bazaar is not perfect in this regard. http://doc.bazaar.canonical.com/latest/en/user-guide/adv_merging.html

lp:~schnetter/pocl/pocl updated
334. By Erik Schnetter

Correct several build errors for Mac OS X

Revision history for this message
Erik Schnetter (schnetter) wrote :

I committed my changes (except adding the atomics) to the 0.6 branch, then merged the branch into the trunk.

Revision history for this message
Pekka Jääskeläinen (pekka-jaaskelainen) wrote :

I can't see them in 0.6. Yep, better put the atomics to trunk, let's try to keep 0.6 in feature freeze.

lp:~schnetter/pocl/pocl updated
335. By Erik Schnetter

Implement atomics

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'INSTALL'
2--- INSTALL 2012-08-08 12:45:17 +0000
3+++ INSTALL 2012-08-15 20:36:22 +0000
4@@ -36,10 +36,14 @@
5 ./configure --enable-shared --prefix=YOUR_INSTALLATION_PREFIX_HERE
6 make REQUIRES_RTTI=1 && make install
7
8-
9 !!!NOTE: A 64-bit host is at the moment required. See
10 https://bugs.launchpad.net/pocl/+bug/911911
11
12+!!!NOTE: On Mac OS X, you cannot build LLVM with LLVM; you need to
13+build using gcc. Use e.g. this command to configure instead:
14+
15+ ./configure --enable-shared CC=gcc CXX=g++ --prefix=YOUR_INSTALLATION_PREFIX_HERE
16+
17 After all the requirements are installed. The installation procedure
18 follows the usual autotools build+install. If you are using a development
19 source tree, you need to generate the autotool build files with
20
21=== modified file 'include/_kernel.h'
22--- include/_kernel.h 2012-05-25 16:05:51 +0000
23+++ include/_kernel.h 2012-08-15 20:36:22 +0000
24@@ -1608,6 +1608,41 @@
25 #endif
26
27
28
29+/* Atomic operations */
30+
31+#define _CL_DECLARE_ATOMICS(MOD, TYPE) \
32+ _cl_overloadable TYPE atomic_add (volatile MOD TYPE *p, TYPE val); \
33+ _cl_overloadable TYPE atomic_sub (volatile MOD TYPE *p, TYPE val); \
34+ _cl_overloadable TYPE atomic_xchg (volatile MOD TYPE *p, TYPE val); \
35+ _cl_overloadable TYPE atomic_inc (volatile MOD TYPE *p); \
36+ _cl_overloadable TYPE atomic_dec (volatile MOD TYPE *p); \
37+ _cl_overloadable TYPE atomic_cmpxchg(volatile MOD TYPE *p, TYPE cmp, TYPE val); \
38+ _cl_overloadable TYPE atomic_min (volatile MOD TYPE *p, TYPE val); \
39+ _cl_overloadable TYPE atomic_max (volatile MOD TYPE *p, TYPE val); \
40+ _cl_overloadable TYPE atomic_and (volatile MOD TYPE *p, TYPE val); \
41+ _cl_overloadable TYPE atomic_or (volatile MOD TYPE *p, TYPE val); \
42+ _cl_overloadable TYPE atomic_xor (volatile MOD TYPE *p, TYPE val);
43+_CL_DECLARE_ATOMICS(__global, int )
44+_CL_DECLARE_ATOMICS(__global, uint)
45+_CL_DECLARE_ATOMICS(__local , int )
46+_CL_DECLARE_ATOMICS(__local , uint)
47+
48+_cl_overloadable float atomic_xchg(volatile __global float *p, float val);
49+_cl_overloadable float atomic_xchg(volatile __local float *p, float val);
50+
51+#define atom_add atomic_add
52+#define atom_sub atomic_sub
53+#define atom_xchg atomic_xchg
54+#define atom_inc atomic_inc
55+#define atom_dec atomic_dec
56+#define atom_cmpxchg atomic_cmpxchg
57+#define atom_min atomic_min
58+#define atom_max atomic_max
59+#define atom_and atomic_and
60+#define atom_or atomic_or
61+#define atom_xor atomic_xor
62+
63+
64
65 /* Miscellaneous Vector Functions */
66
67 // This code leads to an ICE in Clang
68
69=== modified file 'lib/CL/clEnqueueCopyBuffer.c'
70--- lib/CL/clEnqueueCopyBuffer.c 2012-05-14 11:45:48 +0000
71+++ lib/CL/clEnqueueCopyBuffer.c 2012-08-15 20:36:22 +0000
72@@ -23,6 +23,7 @@
73 */
74
75 #include "pocl_cl.h"
76+#include "pocl_icd.h"
77 #include "utlist.h"
78 #include <assert.h>
79
80@@ -55,14 +56,28 @@
81 return CL_INVALID_VALUE;
82
83 device_id = command_queue->device;
84+
85 for (i = 0; i < command_queue->context->num_devices; ++i)
86 {
87 if (command_queue->context->devices[i] == device_id)
88 break;
89 }
90-
91 assert(i < command_queue->context->num_devices);
92
93+ if (event != NULL)
94+ {
95+ *event = (cl_event)malloc(sizeof(struct _cl_event));
96+ if (*event == NULL)
97+ return CL_OUT_OF_HOST_MEMORY;
98+ POCL_INIT_OBJECT(*event);
99+ (*event)->queue = command_queue;
100+ POCL_INIT_ICD_OBJECT(*event);
101+ clRetainCommandQueue (command_queue);
102+
103+ POCL_PROFILE_QUEUED;
104+ }
105+
106+
107 _cl_command_node * cmd = malloc(sizeof(_cl_command_node));
108 if (cmd == NULL)
109 return CL_OUT_OF_HOST_MEMORY;
110@@ -81,8 +96,9 @@
111 cmd->command.copy.dst_ptr = dst_buffer->device_ptrs[device_id->dev_id] + dst_offset;
112 cmd->command.copy.cb = cb;
113 cmd->next = NULL;
114+ cmd->event = event ? *event : NULL;
115
116- LL_APPEND(command_queue->root, cmd );
117+ LL_APPEND(command_queue->root, cmd);
118
119 return CL_SUCCESS;
120 }
121
122=== modified file 'lib/CL/clEnqueueCopyBufferRect.c'
123--- lib/CL/clEnqueueCopyBufferRect.c 2012-05-14 11:45:48 +0000
124+++ lib/CL/clEnqueueCopyBufferRect.c 2012-08-15 20:36:22 +0000
125@@ -22,6 +22,7 @@
126 */
127
128 #include "pocl_cl.h"
129+#include "pocl_icd.h"
130 #include <assert.h>
131
132 CL_API_ENTRY cl_int CL_API_CALL
133@@ -69,14 +70,51 @@
134 return CL_INVALID_VALUE;
135
136 device_id = command_queue->device;
137+
138 for (i = 0; i < command_queue->context->num_devices; ++i)
139 {
140 if (command_queue->context->devices[i] == device_id)
141 break;
142 }
143-
144 assert(i < command_queue->context->num_devices);
145
146+ if (event != NULL)
147+ {
148+ *event = (cl_event)malloc(sizeof(struct _cl_event));
149+ if (*event == NULL)
150+ return CL_OUT_OF_HOST_MEMORY;
151+ POCL_INIT_OBJECT(*event);
152+ (*event)->queue = command_queue;
153+ POCL_INIT_ICD_OBJECT(*event);
154+
155+ clRetainCommandQueue (command_queue);
156+
157+ POCL_PROFILE_QUEUED;
158+ }
159+
160+
161+ /* execute directly */
162+ /* TODO: enqueue the read_rect if this is a non-blocking read (see
163+ clEnqueueReadBuffer) */
164+ if (command_queue->properties & CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE)
165+ {
166+ /* wait for the event in event_wait_list to finish */
167+ POCL_ABORT_UNIMPLEMENTED();
168+ }
169+ else
170+ {
171+ /* in-order queue - all previously enqueued commands must
172+ * finish before this read */
173+ // ensure our buffer is not freed yet
174+ clRetainMemObject (src_buffer);
175+ clRetainMemObject (dst_buffer);
176+ clFinish(command_queue);
177+ }
178+ POCL_PROFILE_SUBMITTED;
179+ POCL_PROFILE_RUNNING;
180+
181+ /* TODO: offset computation doesn't work in case the ptr is not
182+ a direct pointer */
183 device_id->copy_rect(device_id->data,
184 src_buffer->device_ptrs[device_id->dev_id],
185 dst_buffer->device_ptrs[device_id->dev_id],
186@@ -84,5 +122,10 @@
187 src_row_pitch, src_slice_pitch,
188 dst_row_pitch, dst_slice_pitch);
189
190+ POCL_PROFILE_COMPLETE;
191+
192+ clReleaseMemObject (src_buffer);
193+ clReleaseMemObject (dst_buffer);
194+
195 return CL_SUCCESS;
196 }
197
198=== modified file 'lib/CL/clEnqueueNDRangeKernel.c'
199--- lib/CL/clEnqueueNDRangeKernel.c 2012-05-30 14:10:44 +0000
200+++ lib/CL/clEnqueueNDRangeKernel.c 2012-08-15 20:36:22 +0000
201@@ -314,6 +314,8 @@
202 }
203 }
204
205+ command_node->event = event ? *event : NULL;
206+
207 LL_APPEND(command_queue->root, command_node);
208
209 return CL_SUCCESS;
210
211=== modified file 'lib/CL/clEnqueueReadBuffer.c'
212--- lib/CL/clEnqueueReadBuffer.c 2012-05-29 12:16:56 +0000
213+++ lib/CL/clEnqueueReadBuffer.c 2012-08-15 20:36:22 +0000
214@@ -38,6 +38,7 @@
215 cl_event *event) CL_API_SUFFIX__VERSION_1_0
216 {
217 cl_device_id device;
218+ unsigned i;
219
220 if (command_queue == NULL)
221 return CL_INVALID_COMMAND_QUEUE;
222@@ -54,6 +55,13 @@
223
224 device = command_queue->device;
225
226+ for (i = 0; i < command_queue->context->num_devices; ++i)
227+ {
228+ if (command_queue->context->devices[i] == device)
229+ break;
230+ }
231+ assert(i < command_queue->context->num_devices);
232+
233 if (event != NULL)
234 {
235 *event = (cl_event)malloc(sizeof(struct _cl_event));
236@@ -62,14 +70,15 @@
237 POCL_INIT_OBJECT(*event);
238 (*event)->queue = command_queue;
239 POCL_INIT_ICD_OBJECT(*event);
240-
241 clRetainCommandQueue (command_queue);
242
243- POCL_PROFILE_QUEUED;
244+ POCL_PROFILE_QUEUED;
245 }
246
247
248 /* enqueue the read, or execute directly */
249+ /* TODO: why do we implement both? direct execution seems
250+ unnecessary. */
251 if (blocking_read)
252 {
253 if (command_queue->properties & CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE)
254@@ -101,16 +110,17 @@
255 _cl_command_node * cmd = malloc(sizeof(_cl_command_node));
256 if (cmd == NULL)
257 return CL_OUT_OF_HOST_MEMORY;
258-
259+
260 cmd->type = CL_COMMAND_TYPE_READ;
261 cmd->command.read.data = device->data;
262 cmd->command.read.host_ptr = ptr;
263 cmd->command.read.device_ptr = buffer->device_ptrs[device->dev_id]+offset;
264 cmd->command.read.cb = cb;
265+ cmd->command.read.buffer = buffer;
266 cmd->next = NULL;
267- cmd->command.read.buffer = buffer;
268+ cmd->event = event ? *event : NULL;
269 clRetainMemObject (buffer);
270- LL_APPEND(command_queue->root, cmd );
271+ LL_APPEND(command_queue->root, cmd);
272 }
273
274 return CL_SUCCESS;
275
276=== modified file 'lib/CL/clEnqueueReadBufferRect.c'
277--- lib/CL/clEnqueueReadBufferRect.c 2012-05-25 16:05:51 +0000
278+++ lib/CL/clEnqueueReadBufferRect.c 2012-08-15 20:36:22 +0000
279@@ -22,6 +22,7 @@
280 */
281
282 #include "pocl_cl.h"
283+#include "pocl_icd.h"
284 #include <assert.h>
285 #include <stdio.h>
286
287@@ -41,7 +42,7 @@
288 const cl_event *event_wait_list,
289 cl_event *event) CL_API_SUFFIX__VERSION_1_1
290 {
291- cl_device_id device_id;
292+ cl_device_id device;
293 unsigned i;
294
295 if (command_queue == NULL)
296@@ -58,28 +59,67 @@
297 (host_origin == NULL) ||
298 (region == NULL))
299 return CL_INVALID_VALUE;
300-
301+
302 if ((region[0]*region[1]*region[2] > 0) &&
303 (buffer_origin[0] + region[0]-1 +
304 buffer_row_pitch * (buffer_origin[1] + region[1]-1) +
305 buffer_slice_pitch * (buffer_origin[2] + region[2]-1) >= buffer->size))
306 return CL_INVALID_VALUE;
307
308- device_id = command_queue->device;
309+ device = command_queue->device;
310+
311 for (i = 0; i < command_queue->context->num_devices; ++i)
312 {
313- if (command_queue->context->devices[i] == device_id)
314- break;
315+ if (command_queue->context->devices[i] == device)
316+ break;
317 }
318-
319 assert(i < command_queue->context->num_devices);
320
321- device_id->read_rect(device_id->data, ptr,
322- buffer->device_ptrs[device_id->dev_id],
323- buffer_origin, host_origin, region,
324- buffer_row_pitch, buffer_slice_pitch,
325- host_row_pitch, host_slice_pitch);
326-
327+ if (event != NULL)
328+ {
329+ *event = (cl_event)malloc(sizeof(struct _cl_event));
330+ if (*event == NULL)
331+ return CL_OUT_OF_HOST_MEMORY;
332+ POCL_INIT_OBJECT(*event);
333+ (*event)->queue = command_queue;
334+ POCL_INIT_ICD_OBJECT(*event);
335+
336+ clRetainCommandQueue (command_queue);
337+
338+ POCL_PROFILE_QUEUED;
339+ }
340+
341+
342+ /* execute directly */
343+ /* TODO: enqueue the read_rect if this is a non-blocking read (see
344+ clEnqueueReadBuffer) */
345+ if (command_queue->properties & CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE)
346+ {
347+ /* wait for the event in event_wait_list to finish */
348+ POCL_ABORT_UNIMPLEMENTED();
349+ }
350+ else
351+ {
352+ /* in-order queue - all previously enqueued commands must
353+ * finish before this read */
354+ // ensure our buffer is not freed yet
355+ clRetainMemObject (buffer);
356+ clFinish(command_queue);
357+ }
358+ POCL_PROFILE_SUBMITTED;
359+ POCL_PROFILE_RUNNING;
360+
361+ /* TODO: offset computation doesn't work in case the ptr is not
362+ a direct pointer */
363+ device->read_rect(device->data, ptr,
364+ buffer->device_ptrs[device->dev_id],
365+ buffer_origin, host_origin, region,
366+ buffer_row_pitch, buffer_slice_pitch,
367+ host_row_pitch, host_slice_pitch);
368+
369+ POCL_PROFILE_COMPLETE;
370+
371+ clReleaseMemObject (buffer);
372
373 return CL_SUCCESS;
374 }
375
376=== modified file 'lib/CL/clEnqueueUnmapMemObject.c'
377--- lib/CL/clEnqueueUnmapMemObject.c 2012-05-29 12:16:56 +0000
378+++ lib/CL/clEnqueueUnmapMemObject.c 2012-08-15 20:36:22 +0000
379@@ -104,6 +104,6 @@
380 DL_DELETE(memobj->mappings, mapping);
381 memobj->map_count--;
382 clReleaseMemObject (memobj);
383- event = NULL;
384+
385 return CL_SUCCESS;
386 }
387
388=== modified file 'lib/CL/clEnqueueWriteBuffer.c'
389--- lib/CL/clEnqueueWriteBuffer.c 2012-05-29 12:16:56 +0000
390+++ lib/CL/clEnqueueWriteBuffer.c 2012-08-15 20:36:22 +0000
391@@ -37,7 +37,7 @@
392 const cl_event *event_wait_list,
393 cl_event *event) CL_API_SUFFIX__VERSION_1_0
394 {
395- cl_device_id device_id;
396+ cl_device_id device;
397 unsigned i;
398
399 if (command_queue == NULL)
400@@ -53,13 +53,13 @@
401 (offset + cb > buffer->size))
402 return CL_INVALID_VALUE;
403
404- device_id = command_queue->device;
405+ device = command_queue->device;
406+
407 for (i = 0; i < command_queue->context->num_devices; ++i)
408 {
409- if (command_queue->context->devices[i] == device_id)
410- break;
411+ if (command_queue->context->devices[i] == device)
412+ break;
413 }
414-
415 assert(i < command_queue->context->num_devices);
416
417 if (event != NULL)
418@@ -76,6 +76,8 @@
419 }
420
421 /* enqueue the write, or execute directly */
422+ /* TODO: why do we implement both? direct execution seems
423+ unnecessary. */
424 if (blocking_write)
425 {
426 if (command_queue->properties & CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE)
427@@ -94,17 +96,9 @@
428 POCL_PROFILE_SUBMITTED;
429 POCL_PROFILE_RUNNING;
430 /* TODO: fixme. The offset computation must be done at the device driver. */
431- device_id->write(device_id->data, ptr, buffer->device_ptrs[device_id->dev_id]+offset, cb);
432+ device->write(device->data, ptr, buffer->device_ptrs[device->dev_id]+offset, cb);
433 POCL_PROFILE_COMPLETE;
434
435- if (command_queue->properties & CL_QUEUE_PROFILING_ENABLE &&
436- event != NULL)
437- {
438- (*event)->status = CL_COMPLETE;
439- (*event)->time_end =
440- command_queue->device->get_timer_value(command_queue->device->data);
441- }
442-
443 clReleaseMemObject (buffer);
444 }
445 else
446@@ -112,14 +106,15 @@
447 _cl_command_node * cmd = malloc(sizeof(_cl_command_node));
448 if (cmd == NULL)
449 return CL_OUT_OF_HOST_MEMORY;
450-
451+
452 cmd->type = CL_COMMAND_TYPE_WRITE;
453- cmd->command.write.data = device_id->data;
454+ cmd->command.write.data = device->data;
455 cmd->command.write.host_ptr = ptr;
456 cmd->command.write.device_ptr = buffer->device_ptrs[i]+offset;
457 cmd->command.write.cb = cb;
458+ cmd->command.write.buffer = buffer;
459 cmd->next = NULL;
460- cmd->command.write.buffer = buffer;
461+ cmd->event = event ? *event : NULL;
462 clRetainMemObject (buffer);
463
464 LL_APPEND(command_queue->root, cmd);
465
466=== modified file 'lib/CL/clEnqueueWriteBufferRect.c'
467--- lib/CL/clEnqueueWriteBufferRect.c 2012-05-14 11:45:48 +0000
468+++ lib/CL/clEnqueueWriteBufferRect.c 2012-08-15 20:36:22 +0000
469@@ -40,7 +40,7 @@
470 const cl_event *event_wait_list,
471 cl_event *event) CL_API_SUFFIX__VERSION_1_1
472 {
473- cl_device_id device_id;
474+ cl_device_id device;
475 unsigned i;
476
477 if (command_queue == NULL)
478@@ -57,43 +57,55 @@
479 (host_origin == NULL) ||
480 (region == NULL))
481 return CL_INVALID_VALUE;
482-
483+
484 if ((region[0]*region[1]*region[2] > 0) &&
485 (buffer_origin[0] + region[0]-1 +
486 buffer_row_pitch * (buffer_origin[1] + region[1]-1) +
487 buffer_slice_pitch * (buffer_origin[2] + region[2]-1) >= buffer->size))
488 {
489 POCL_ABORT_UNIMPLEMENTED();
490-#if 0
491- printf("bo=[%d,%d,%d]\n"
492- "ho=[%d,%d,%d]\n"
493- "re=[%d,%d,%d]\n"
494- "bp=[,%d,%d]\n"
495- "hp=[,%d,%d]\n"
496- "bs=[%d]\n",
497- (int)buffer_origin[0], (int)buffer_origin[1], (int)buffer_origin[2],
498- (int)host_origin[0], (int)host_origin[1], (int)host_origin[2],
499- (int)region[0], (int)region[1], (int)region[2],
500- (int)buffer_row_pitch, (int)buffer_slice_pitch,
501- (int)host_row_pitch, (int)host_slice_pitch,
502- (int)buffer->size);
503-#endif
504 return CL_INVALID_VALUE;
505 }
506
507- device_id = command_queue->device;
508+ device = command_queue->device;
509+
510 for (i = 0; i < command_queue->context->num_devices; ++i)
511 {
512- if (command_queue->context->devices[i] == device_id)
513+ if (command_queue->context->devices[i] == device)
514 break;
515 }
516-
517 assert(i < command_queue->context->num_devices);
518
519- device_id->write_rect(device_id->data, ptr, buffer->device_ptrs[device_id->dev_id],
520+
521+ /* execute directly */
522+ /* TODO: enqueue the write_rect if this is a non-blocking read (see
523+ clEnqueueWriteBuffer) */
524+ if (command_queue->properties & CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE)
525+ {
526+ /* wait for the event in event_wait_list to finish */
527+ POCL_ABORT_UNIMPLEMENTED();
528+ }
529+ else
530+ {
531+ /* in-order queue - all previously enqueued commands must
532+ * finish before this read */
533+ // ensure our buffer is not freed yet
534+ clRetainMemObject (buffer);
535+ clFinish(command_queue);
536+ }
537+ POCL_PROFILE_SUBMITTED;
538+ POCL_PROFILE_RUNNING;
539+
540+ /* TODO: offset computation doesn't work in case the ptr is not
541+ a direct pointer */
542+ device->write_rect(device->data, ptr, buffer->device_ptrs[device->dev_id],
543 buffer_origin, host_origin, region,
544 buffer_row_pitch, buffer_slice_pitch,
545 host_row_pitch, host_slice_pitch);
546
547+ POCL_PROFILE_COMPLETE;
548+
549+ clReleaseMemObject (buffer);
550+
551 return CL_SUCCESS;
552 }
553
554=== modified file 'lib/CL/clFinish.c'
555--- lib/CL/clFinish.c 2012-05-31 12:01:27 +0000
556+++ lib/CL/clFinish.c 2012-08-15 20:36:22 +0000
557@@ -41,6 +41,7 @@
558 {
559 case CL_COMMAND_TYPE_READ:
560 POCL_PROFILE_SUBMITTED;
561+ POCL_PROFILE_RUNNING;
562 command_queue->device->read
563 (node->command.read.data,
564 node->command.read.host_ptr,
565@@ -51,6 +52,7 @@
566 break;
567 case CL_COMMAND_TYPE_WRITE:
568 POCL_PROFILE_SUBMITTED;
569+ POCL_PROFILE_RUNNING;
570 command_queue->device->write
571 (node->command.write.data,
572 node->command.write.host_ptr,
573@@ -61,6 +63,7 @@
574 break;
575 case CL_COMMAND_TYPE_COPY:
576 POCL_PROFILE_SUBMITTED;
577+ POCL_PROFILE_RUNNING;
578 command_queue->device->copy
579 (node->command.copy.data,
580 node->command.copy.src_ptr,
581@@ -71,9 +74,11 @@
582 clReleaseMemObject (node->command.copy.dst_buffer);
583 break;
584 case CL_COMMAND_TYPE_RUN:
585- POCL_PROFILE_SUBMITTED;
586 assert (*event == node->event);
587+ POCL_PROFILE_SUBMITTED;
588+ POCL_PROFILE_RUNNING;
589 command_queue->device->run(node->command.run.data, node);
590+ POCL_PROFILE_COMPLETE;
591 for (i = 0; i < node->command.run.arg_buffer_count; ++i)
592 {
593 cl_mem buf = node->command.run.arg_buffers[i];
594
595=== modified file 'lib/CL/clGetDeviceInfo.c'
596--- lib/CL/clGetDeviceInfo.c 2012-08-08 12:45:17 +0000
597+++ lib/CL/clGetDeviceInfo.c 2012-08-15 20:36:22 +0000
598@@ -126,23 +126,17 @@
599 }
600
601 case CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR:
602- POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_char);
603-
604+ POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_char);
605 case CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT:
606 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_short);
607-
608 case CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT:
609 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_int);
610-
611 case CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG:
612 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_long);
613-
614 case CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT:
615 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_float);
616-
617 case CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE:
618 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_double);
619-
620 case CL_DEVICE_MAX_CLOCK_FREQUENCY :
621 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->max_clock_frequency);
622 case CL_DEVICE_ADDRESS_BITS :
623@@ -186,7 +180,7 @@
624 case CL_DEVICE_MAX_CONSTANT_ARGS :
625 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->max_constant_args);
626 case CL_DEVICE_LOCAL_MEM_TYPE :
627- POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->local_mem_size);
628+ POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->local_mem_type);
629 case CL_DEVICE_LOCAL_MEM_SIZE:
630 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_ulong, device->local_mem_size);
631 case CL_DEVICE_ERROR_CORRECTION_SUPPORT :
632@@ -218,28 +212,34 @@
633
634 case CL_DEVICE_EXTENSIONS :
635 POCL_RETURN_DEVICE_INFO_STR("cl_khr_fp16");
636- case CL_DEVICE_PLATFORM : break;
637- case CL_DEVICE_DOUBLE_FP_CONFIG : break;
638+ case CL_DEVICE_PLATFORM :
639+ {
640+ /* Return the first platform id, assuming this is the only
641+ platform id (which is currently always the case for pocl) */
642+ cl_platform_id platform_id;
643+ clGetPlatformIDs(1, &platform_id, NULL);
644+ POCL_RETURN_DEVICE_INFO(cl_platform_id, platform_id);
645+ }
646+ case CL_DEVICE_DOUBLE_FP_CONFIG :
647+ POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_ulong, device->double_fp_config);
648 case CL_DEVICE_HALF_FP_CONFIG : break;
649- case CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF : break;
650+ case CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF :
651+ POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_half);
652 case CL_DEVICE_HOST_UNIFIED_MEMORY : break;
653- /* TODO: figure out what the difference between preferred and native
654- widths are. And why there is no struct fields 'native_vector...' */
655 case CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR :
656- POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_char);
657+ POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->native_vector_width_char);
658 case CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT :
659- POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_short);
660+ POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->native_vector_width_short);
661 case CL_DEVICE_NATIVE_VECTOR_WIDTH_INT :
662- POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_int);
663+ POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->native_vector_width_int);
664 case CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG :
665- POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_long);
666+ POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->native_vector_width_long);
667 case CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT :
668- POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_float);
669+ POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->native_vector_width_float);
670 case CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE :
671- POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_double);
672+ POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->native_vector_width_double);
673 case CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF :
674- /* TODO: why is there no preferred_vector_width_half? */
675- POCL_RETURN_DEVICE_INFO(cl_uint, 0);
676+ POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->native_vector_width_half);
677 case CL_DEVICE_OPENCL_C_VERSION :
678 POCL_RETURN_DEVICE_INFO_STR("1.2");
679 }
680
681=== modified file 'lib/CL/clGetKernelWorkGroupInfo.c'
682--- lib/CL/clGetKernelWorkGroupInfo.c 2012-03-28 11:25:45 +0000
683+++ lib/CL/clGetKernelWorkGroupInfo.c 2012-08-15 20:36:22 +0000
684@@ -65,7 +65,7 @@
685 POCL_ABORT_UNIMPLEMENTED();
686
687 case CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE:
688- POCL_RETURN_KERNEL_WG_INFO(size_t, device->preferred_wg_size_multiple);
689+ POCL_RETURN_KERNEL_WG_INFO(size_t, device->preferred_wg_size_multiple);
690
691 case CL_KERNEL_LOCAL_MEM_SIZE:
692 POCL_RETURN_KERNEL_WG_INFO(cl_ulong, device->local_mem_size);
693
694=== modified file 'lib/CL/clReleaseProgram.c'
695--- lib/CL/clReleaseProgram.c 2012-05-31 12:01:27 +0000
696+++ lib/CL/clReleaseProgram.c 2012-08-15 20:36:22 +0000
697@@ -58,7 +58,7 @@
698 free (program->binary_sizes);
699
700 env = getenv ("POCL_LEAVE_TEMP_DIRS");
701- if (!(env != NULL && strlen (env) == 1 && env[0] == '1') &&
702+ if (!(env != NULL && strcmp (env, "1") == 0) &&
703 getenv("POCL_TEMP_DIR") == NULL)
704 {
705 remove_directory (program->temp_dir);
706
707=== modified file 'lib/CL/devices/basic/basic.h'
708--- lib/CL/devices/basic/basic.h 2012-08-08 14:54:32 +0000
709+++ lib/CL/devices/basic/basic.h 2012-08-15 20:36:22 +0000
710@@ -46,6 +46,15 @@
711 POCL_DEVICES_PREFERRED_VECTOR_WIDTH_LONG , /* preferred_vector_width_long */ \
712 POCL_DEVICES_PREFERRED_VECTOR_WIDTH_FLOAT , /* preferred_vector_width_float */ \
713 POCL_DEVICES_PREFERRED_VECTOR_WIDTH_DOUBLE, /* preferred_vector_width_double */ \
714+ POCL_DEVICES_PREFERRED_VECTOR_WIDTH_HALF , /* preferred_vector_width_half */ \
715+ /* TODO: figure out what the difference between preferred and native widths are. */ \
716+ POCL_DEVICES_PREFERRED_VECTOR_WIDTH_CHAR , /* preferred_vector_width_char */ \
717+ POCL_DEVICES_PREFERRED_VECTOR_WIDTH_SHORT , /* preferred_vector_width_short */ \
718+ POCL_DEVICES_PREFERRED_VECTOR_WIDTH_INT , /* preferred_vector_width_int */ \
719+ POCL_DEVICES_PREFERRED_VECTOR_WIDTH_LONG , /* preferred_vector_width_long */ \
720+ POCL_DEVICES_PREFERRED_VECTOR_WIDTH_FLOAT , /* preferred_vector_width_float */ \
721+ POCL_DEVICES_PREFERRED_VECTOR_WIDTH_DOUBLE, /* preferred_vector_width_double */ \
722+ POCL_DEVICES_PREFERRED_VECTOR_WIDTH_HALF , /* preferred_vector_width_half */ \
723 0, /* max_clock_frequency */ \
724 0, /* address_bits */ \
725 0, /* max_mem_alloc_size */ \
726@@ -62,6 +71,7 @@
727 0, /* mem_base_addr_align */ \
728 0, /* min_data_type_align_size */ \
729 CL_FP_ROUND_TO_NEAREST | CL_FP_INF_NAN, /* single_fp_config */ \
730+ CL_FP_ROUND_TO_NEAREST | CL_FP_INF_NAN, /* double_fp_config */ \
731 CL_NONE, /* global_mem_cache_type */ \
732 0, /* global_mem_cacheline_size */ \
733 0, /* global_mem_cache_size */ \
734
735=== modified file 'lib/CL/devices/common.h'
736--- lib/CL/devices/common.h 2012-04-24 13:12:25 +0000
737+++ lib/CL/devices/common.h 2012-08-15 20:36:22 +0000
738@@ -70,5 +70,6 @@
739 #endif
740 /* Half is internally represented as short */
741 #define POCL_DEVICES_PREFERRED_VECTOR_WIDTH_HALF POCL_DEVICES_PREFERRED_VECTOR_WIDTH_SHORT
742+#define POCL_DEVICES_NATIVE_VECTOR_WIDTH_HALF POCL_DEVICES_NATIVE_VECTOR_WIDTH_SHORT
743
744 #endif
745
746=== modified file 'lib/CL/devices/devices.c'
747--- lib/CL/devices/devices.c 2012-05-31 12:01:27 +0000
748+++ lib/CL/devices/devices.c 2012-08-15 20:36:22 +0000
749@@ -57,7 +57,7 @@
750 pocl_init_devices()
751 {
752 const char *device_list;
753- char *ptr, *tofree, *token, *saveptr, *saveptr2;
754+ char *ptr, *tofree, *token, *saveptr;
755 int i, devcount;
756 if (pocl_num_devices > 0)
757 return;
758@@ -83,7 +83,7 @@
759
760 ptr = tofree = strdup(device_list);
761 devcount = 0;
762- while ((token = strtok_r (ptr, " ", &saveptr2)) != NULL)
763+ while ((token = strtok_r (ptr, " ", &saveptr)) != NULL)
764 {
765 struct _cl_device_id* device_type = NULL;
766
767
768=== modified file 'lib/CL/devices/pthread/pocl-pthread.h'
769--- lib/CL/devices/pthread/pocl-pthread.h 2012-08-08 14:54:32 +0000
770+++ lib/CL/devices/pthread/pocl-pthread.h 2012-08-15 20:36:22 +0000
771@@ -50,6 +50,15 @@
772 POCL_DEVICES_PREFERRED_VECTOR_WIDTH_LONG , /* preferred_vector_width_long */ \
773 POCL_DEVICES_PREFERRED_VECTOR_WIDTH_FLOAT , /* preferred_vector_width_float */ \
774 POCL_DEVICES_PREFERRED_VECTOR_WIDTH_DOUBLE, /* preferred_vector_width_double */ \
775+ POCL_DEVICES_PREFERRED_VECTOR_WIDTH_HALF, /* preferred_vector_width_half */ \
776+ /* TODO: figure out what the difference between preferred and native widths are. */ \
777+ POCL_DEVICES_PREFERRED_VECTOR_WIDTH_CHAR , /* preferred_vector_width_char */ \
778+ POCL_DEVICES_PREFERRED_VECTOR_WIDTH_SHORT , /* preferred_vector_width_short */ \
779+ POCL_DEVICES_PREFERRED_VECTOR_WIDTH_INT , /* preferred_vector_width_int */ \
780+ POCL_DEVICES_PREFERRED_VECTOR_WIDTH_LONG , /* preferred_vector_width_long */ \
781+ POCL_DEVICES_PREFERRED_VECTOR_WIDTH_FLOAT , /* preferred_vector_width_float */ \
782+ POCL_DEVICES_PREFERRED_VECTOR_WIDTH_DOUBLE, /* preferred_vector_width_double */ \
783+ POCL_DEVICES_PREFERRED_VECTOR_WIDTH_HALF , /* preferred_vector_width_half */ \
784 0, /* max_clock_frequency */ \
785 0, /* address_bits */ \
786 0, /* max_mem_alloc_size */ \
787@@ -66,6 +75,7 @@
788 0, /* mem_base_addr_align */ \
789 0, /* min_data_type_align_size */ \
790 CL_FP_ROUND_TO_NEAREST | CL_FP_INF_NAN, /* single_fp_config */ \
791+ CL_FP_ROUND_TO_NEAREST | CL_FP_INF_NAN, /* double_fp_config */ \
792 CL_NONE, /* global_mem_cache_type */ \
793 0, /* global_mem_cacheline_size */ \
794 0, /* global_mem_cache_size */ \
795
796=== modified file 'lib/CL/pocl_cl.h'
797--- lib/CL/pocl_cl.h 2012-08-08 14:54:32 +0000
798+++ lib/CL/pocl_cl.h 2012-08-15 20:36:22 +0000
799@@ -25,6 +25,7 @@
800 #define POCL_CL_H
801
802 #include "config.h"
803+#include <assert.h>
804 #include <stdio.h>
805 #include <ltdl.h>
806 #include <pthread.h>
807@@ -150,6 +151,14 @@
808 cl_uint preferred_vector_width_long;
809 cl_uint preferred_vector_width_float;
810 cl_uint preferred_vector_width_double;
811+ cl_uint preferred_vector_width_half;
812+ cl_uint native_vector_width_char;
813+ cl_uint native_vector_width_short;
814+ cl_uint native_vector_width_int;
815+ cl_uint native_vector_width_long;
816+ cl_uint native_vector_width_float;
817+ cl_uint native_vector_width_double;
818+ cl_uint native_vector_width_half;
819 cl_uint max_clock_frequency;
820 cl_uint address_bits;
821 cl_ulong max_mem_alloc_size;
822@@ -166,6 +175,7 @@
823 cl_uint mem_base_addr_align;
824 cl_uint min_data_type_align_size;
825 cl_device_fp_config single_fp_config;
826+ cl_device_fp_config double_fp_config;
827 cl_device_mem_cache_type global_mem_cache_type;
828 cl_uint global_mem_cacheline_size;
829 cl_ulong global_mem_cache_size;
830@@ -394,12 +404,12 @@
831 } \
832 } while (0) \
833
834-
835 #define POCL_PROFILE_SUBMITTED \
836 do { \
837 if (command_queue->properties & CL_QUEUE_PROFILING_ENABLE && \
838 event != NULL && (*event) != NULL) \
839 { \
840+ assert((*event)->status = CL_QUEUED); \
841 (*event)->status = CL_SUBMITTED; \
842 (*event)->time_submit = \
843 command_queue->device->get_timer_value(command_queue->device->data); \
844@@ -411,6 +421,7 @@
845 if (command_queue->properties & CL_QUEUE_PROFILING_ENABLE && \
846 event != NULL && (*event) != NULL) \
847 { \
848+ assert((*event)->status = CL_SUBMITTED); \
849 (*event)->status = CL_RUNNING; \
850 (*event)->time_start = \
851 command_queue->device->get_timer_value(command_queue->device->data); \
852@@ -422,6 +433,7 @@
853 if (command_queue->properties & CL_QUEUE_PROFILING_ENABLE && \
854 event != NULL && (*event) != NULL) \
855 { \
856+ assert((*event)->status = CL_RUNNING); \
857 (*event)->status = CL_COMPLETE; \
858 (*event)->time_end = \
859 command_queue->device->get_timer_value(command_queue->device->data); \
860
861=== added file 'lib/kernel/atomics.cl'
862--- lib/kernel/atomics.cl 1970-01-01 00:00:00 +0000
863+++ lib/kernel/atomics.cl 2012-08-15 20:36:22 +0000
864@@ -0,0 +1,149 @@
865+/* OpenCL built-in library: atomic operations
866+
867+ Copyright (c) 2012 Universidad Rey Juan Carlos
868+
869+ Permission is hereby granted, free of charge, to any person obtaining a copy
870+ of this software and associated documentation files (the "Software"), to deal
871+ in the Software without restriction, including without limitation the rights
872+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
873+ copies of the Software, and to permit persons to whom the Software is
874+ furnished to do so, subject to the following conditions:
875+
876+ The above copyright notice and this permission notice shall be included in
877+ all copies or substantial portions of the Software.
878+
879+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
880+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
881+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
882+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
883+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
884+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
885+ THE SOFTWARE.
886+*/
887+
888+
889+
890+// Repeat the content of this file several times with different values
891+// for Q, T, and U:
892+#if !defined(Q)
893+
894+# define Q __global
895+# include "atomics.cl"
896+# undef Q
897+
898+# define Q __local
899+# include "atomics.cl"
900+# undef Q
901+
902+#elif !defined(T)
903+
904+# define T int
905+# define MIN __sync_fetch_and_min
906+# define MAX __sync_fetch_and_max
907+# include "atomics.cl"
908+# undef T
909+# undef MIN
910+# undef MAX
911+
912+# define T uint
913+# define MIN __sync_fetch_and_umin
914+# define MAX __sync_fetch_and_umax
915+# include "atomics.cl"
916+# undef T
917+# undef MIN
918+# undef MAX
919+
920+
921+// xchg is also supported for float as a special case
922+__attribute__((overloadable))
923+float atomic_xchg(volatile Q float *p, float val)
924+{
925+ // NOTE: We compare the float as int here...
926+ return __atomic_exchange_n((volatile int*)p, val, __ATOMIC_RELAXED);
927+}
928+
929+#else
930+
931+
932+
933+// basic
934+
935+// read, add, store
936+__attribute__((overloadable))
937+T atomic_add(volatile Q T *p, T val)
938+{
939+ return __sync_fetch_and_add((volatile T*)p, val, __ATOMIC_RELAXED);
940+}
941+
942+// read, subtract, store
943+__attribute__((overloadable))
944+T atomic_sub(volatile Q T *p, T val)
945+{
946+ return __sync_fetch_and_sub(p, val, __ATOMIC_RELAXED);
947+}
948+
949+// read, swap, store
950+__attribute__((overloadable))
951+T atomic_xchg(volatile Q T *p, T val)
952+{
953+ return __atomic_exchange_n(p, val, __ATOMIC_RELAXED);
954+}
955+
956+// read, increment, store
957+__attribute__((overloadable))
958+T atomic_inc(volatile Q T *p)
959+{
960+ return atomic_add(p, (T)1);
961+}
962+
963+// read, decrement, store
964+__attribute__((overloadable))
965+T atomic_dec(volatile Q T *p)
966+{
967+ return atomic_sub(p, (T)1);
968+}
969+
970+// read, store
971+__attribute__((overloadable))
972+T atomic_cmpxchg(volatile Q T *p, T cmp, T val)
973+{
974+ __atomic_compare_exchange_n(p, &cmp, val, false,
975+ __ATOMIC_RELAXED, __ATOMIC_RELAXED);
976+ return cmp;
977+}
978+
979+// extended
980+
981+__attribute__((overloadable))
982+T atomic_min(volatile Q T *p, T val)
983+{
984+ return MIN((volatile T*)p, val);
985+}
986+
987+__attribute__((overloadable))
988+T atomic_max(volatile Q T *p, T val)
989+{
990+ return MAX((volatile T*)p, val);
991+}
992+
993+__attribute__((overloadable))
994+T atomic_and(volatile Q T *p, T val)
995+{
996+ return __sync_fetch_and_and(p, val, __ATOMIC_RELAXED);
997+}
998+
999+__attribute__((overloadable))
1000+T atomic_or(volatile Q T *p, T val)
1001+{
1002+ return __sync_fetch_and_or(p, val, __ATOMIC_RELAXED);
1003+}
1004+
1005+__attribute__((overloadable))
1006+T atomic_xor(volatile Q T *p, T val)
1007+{
1008+ return __sync_fetch_and_xor(p, val, __ATOMIC_RELAXED);
1009+}
1010+
1011+
1012+
1013+#endif
1014
1015=== modified file 'lib/kernel/hadd.cl'
1016--- lib/kernel/hadd.cl 2011-10-26 13:03:37 +0000
1017+++ lib/kernel/hadd.cl 2012-08-15 20:36:22 +0000
1018@@ -23,7 +23,5 @@
1019
1020 #include "templates.h"
1021
1022-// This could do with some testing
1023-// This could probably also be optimised (i.e. the ?: operators eliminated)
1024 DEFINE_EXPR_G_GG(hadd,
1025 (a >> (sgtype)1) + (b >> (sgtype)1) + (a & b & (gtype)1))
1026
1027=== modified file 'lib/kernel/rhadd.cl'
1028--- lib/kernel/rhadd.cl 2011-10-26 13:03:37 +0000
1029+++ lib/kernel/rhadd.cl 2012-08-15 20:36:22 +0000
1030@@ -23,7 +23,5 @@
1031
1032 #include "templates.h"
1033
1034-// This could do with some testing
1035-// This could probably also be optimised (i.e. the ?: operators eliminated)
1036 DEFINE_EXPR_G_GG(rhadd,
1037 (a >> (sgtype)1) + (b >> (sgtype)1) + ((a | b) & (gtype)1))
1038
1039=== modified file 'lib/kernel/sources.mk'
1040--- lib/kernel/sources.mk 2012-06-04 12:15:18 +0000
1041+++ lib/kernel/sources.mk 2012-08-15 20:36:22 +0000
1042@@ -1,144 +1,146 @@
1043 # Nodist here because these files should be included
1044 # to the distribution only once, from the root kernel
1045 # makefile.
1046-nodist_libkernel_a_SOURCES = templates.h \
1047- barrier.ll \
1048- image.h \
1049- get_work_dim.c \
1050- get_global_size.c \
1051- get_global_id.c \
1052- get_local_size.c \
1053- get_local_id.c \
1054- get_num_groups.c \
1055- get_group_id.c \
1056- get_global_offset.c \
1057- as_type.cl \
1058- convert_type.cl \
1059- acos.cl \
1060- acosh.cl \
1061- acospi.cl \
1062- asin.cl \
1063- asinh.cl \
1064- asinpi.cl \
1065- atan.cl \
1066- atan2.cl \
1067- atan2pi.cl \
1068- atanh.cl \
1069- atanpi.cl \
1070- cbrt.cl \
1071- ceil.cl \
1072- copysign.cl \
1073- cos.cl \
1074- cosh.cl \
1075- cospi.cl \
1076- erfc.cl \
1077- erf.cl \
1078- exp.cl \
1079- exp2.cl \
1080- exp10.cl \
1081- expm1.cl \
1082- fabs.cl \
1083- fdim.cl \
1084- floor.cl \
1085- fma.cl \
1086- fmax.cl \
1087- fmin.cl \
1088- fmod.cl \
1089- fract.cl \
1090- hypot.cl \
1091- ilogb.cl \
1092- ldexp.cl \
1093- lgamma.cl \
1094- log.cl \
1095- log2.cl \
1096- log10.cl \
1097- log1p.cl \
1098- logb.cl \
1099- mad.cl \
1100- maxmag.cl \
1101- minmag.cl \
1102- nan.cl \
1103- nextafter.cl \
1104- pow.cl \
1105- pown.cl \
1106- powr.cl \
1107- remainder.cl \
1108- rint.cl \
1109- rootn.cl \
1110- round.cl \
1111- rsqrt.cl \
1112- sin.cl \
1113- sincos.cl \
1114- sinh.cl \
1115- sinpi.cl \
1116- sqrt.cl \
1117- tan.cl \
1118- tanh.cl \
1119- tanpi.cl \
1120- tgamma.cl \
1121- trunc.cl \
1122- divide.cl \
1123- recip.cl \
1124- abs.cl \
1125- abs_diff.cl \
1126- add_sat.cl \
1127- hadd.cl \
1128- rhadd.cl \
1129- clamp.cl \
1130- clz.cl \
1131- mad_hi.cl \
1132- mad_sat.cl \
1133- max.cl \
1134- min.cl \
1135- mul_hi.cl \
1136- rotate.cl \
1137- sub_sat.cl \
1138- upsample.cl \
1139- popcount.cl \
1140- mad24.cl \
1141- mul24.cl \
1142- degrees.cl \
1143- mix.cl \
1144- radians.cl \
1145- step.cl \
1146- smoothstep.cl \
1147- sign.cl \
1148- cross.cl \
1149- dot.cl \
1150- distance.cl \
1151- length.cl \
1152- normalize.cl \
1153- fast_distance.cl \
1154- fast_length.cl \
1155- fast_normalize.cl \
1156- isequal.cl \
1157- isnotequal.cl \
1158- isgreater.cl \
1159- isgreaterequal.cl \
1160- isless.cl \
1161- islessequal.cl \
1162- islessgreater.cl \
1163- isfinite.cl \
1164- isinf.cl \
1165- isnan.cl \
1166- isnormal.cl \
1167- isordered.cl \
1168- isunordered.cl \
1169- signbit.cl \
1170- any.cl \
1171- all.cl \
1172- bitselect.cl \
1173- select.cl \
1174- vload.cl \
1175- vstore.cl \
1176- vload_half.cl \
1177- vstore_half.cl \
1178- async_work_group_copy.cl \
1179- wait_group_events.cl \
1180- read_image.cl \
1181- write_image.cl \
1182- get_image_width.cl \
1183- get_image_height.cl
1184+nodist_libkernel_a_SOURCES = \
1185+ templates.h \
1186+ barrier.ll \
1187+ image.h \
1188+ get_work_dim.c \
1189+ get_global_size.c \
1190+ get_global_id.c \
1191+ get_local_size.c \
1192+ get_local_id.c \
1193+ get_num_groups.c \
1194+ get_group_id.c \
1195+ get_global_offset.c \
1196+ as_type.cl \
1197+ atomics.cl \
1198+ convert_type.cl \
1199+ acos.cl \
1200+ acosh.cl \
1201+ acospi.cl \
1202+ asin.cl \
1203+ asinh.cl \
1204+ asinpi.cl \
1205+ atan.cl \
1206+ atan2.cl \
1207+ atan2pi.cl \
1208+ atanh.cl \
1209+ atanpi.cl \
1210+ cbrt.cl \
1211+ ceil.cl \
1212+ copysign.cl \
1213+ cos.cl \
1214+ cosh.cl \
1215+ cospi.cl \
1216+ erfc.cl \
1217+ erf.cl \
1218+ exp.cl \
1219+ exp2.cl \
1220+ exp10.cl \
1221+ expm1.cl \
1222+ fabs.cl \
1223+ fdim.cl \
1224+ floor.cl \
1225+ fma.cl \
1226+ fmax.cl \
1227+ fmin.cl \
1228+ fmod.cl \
1229+ fract.cl \
1230+ hypot.cl \
1231+ ilogb.cl \
1232+ ldexp.cl \
1233+ lgamma.cl \
1234+ log.cl \
1235+ log2.cl \
1236+ log10.cl \
1237+ log1p.cl \
1238+ logb.cl \
1239+ mad.cl \
1240+ maxmag.cl \
1241+ minmag.cl \
1242+ nan.cl \
1243+ nextafter.cl \
1244+ pow.cl \
1245+ pown.cl \
1246+ powr.cl \
1247+ remainder.cl \
1248+ rint.cl \
1249+ rootn.cl \
1250+ round.cl \
1251+ rsqrt.cl \
1252+ sin.cl \
1253+ sincos.cl \
1254+ sinh.cl \
1255+ sinpi.cl \
1256+ sqrt.cl \
1257+ tan.cl \
1258+ tanh.cl \
1259+ tanpi.cl \
1260+ tgamma.cl \
1261+ trunc.cl \
1262+ divide.cl \
1263+ recip.cl \
1264+ abs.cl \
1265+ abs_diff.cl \
1266+ add_sat.cl \
1267+ hadd.cl \
1268+ rhadd.cl \
1269+ clamp.cl \
1270+ clz.cl \
1271+ mad_hi.cl \
1272+ mad_sat.cl \
1273+ max.cl \
1274+ min.cl \
1275+ mul_hi.cl \
1276+ rotate.cl \
1277+ sub_sat.cl \
1278+ upsample.cl \
1279+ popcount.cl \
1280+ mad24.cl \
1281+ mul24.cl \
1282+ degrees.cl \
1283+ mix.cl \
1284+ radians.cl \
1285+ step.cl \
1286+ smoothstep.cl \
1287+ sign.cl \
1288+ cross.cl \
1289+ dot.cl \
1290+ distance.cl \
1291+ length.cl \
1292+ normalize.cl \
1293+ fast_distance.cl \
1294+ fast_length.cl \
1295+ fast_normalize.cl \
1296+ isequal.cl \
1297+ isnotequal.cl \
1298+ isgreater.cl \
1299+ isgreaterequal.cl \
1300+ isless.cl \
1301+ islessequal.cl \
1302+ islessgreater.cl \
1303+ isfinite.cl \
1304+ isinf.cl \
1305+ isnan.cl \
1306+ isnormal.cl \
1307+ isordered.cl \
1308+ isunordered.cl \
1309+ signbit.cl \
1310+ any.cl \
1311+ all.cl \
1312+ bitselect.cl \
1313+ select.cl \
1314+ vload.cl \
1315+ vstore.cl \
1316+ vload_half.cl \
1317+ vstore_half.cl \
1318+ async_work_group_copy.cl \
1319+ wait_group_events.cl \
1320+ read_image.cl \
1321+ write_image.cl \
1322+ get_image_width.cl \
1323+ get_image_height.cl
1324
1325 barrier.o: barrier.ll
1326 $(LLVM_AS) -o $@ $<
1327
1328=== modified file 'lib/kernel/sub_sat.cl'
1329--- lib/kernel/sub_sat.cl 2011-10-26 19:49:23 +0000
1330+++ lib/kernel/sub_sat.cl 2012-08-15 20:36:22 +0000
1331@@ -30,8 +30,6 @@
1332 // ushort __builtin_ia32_psubusw128
1333 // Other types don't seem to be supported.
1334
1335-// This could do with some testing
1336-// This could probably also be optimised (i.e. the ?: operators eliminated)
1337 DEFINE_EXPR_G_GG(sub_sat,
1338 (sgtype)-1 < (sgtype)0 ?
1339 /* signed */
1340
1341=== modified file 'lib/llvmopencl/Makefile.am'
1342--- lib/llvmopencl/Makefile.am 2012-06-15 12:26:40 +0000
1343+++ lib/llvmopencl/Makefile.am 2012-08-15 20:36:22 +0000
1344@@ -40,4 +40,4 @@
1345 ImplicitLoopBarriers.h ImplicitLoopBarriers.cc \
1346 WorkItemAliasAnalysis.cc WIVectorize.cc
1347
1348-#llvmopencl_la_LIBADD = @LIBS_LLVMTRANSFORMUTILS@
1349\ No newline at end of file
1350+#llvmopencl_la_LIBADD = @LIBS_LLVMTRANSFORMUTILS@
1351
1352=== modified file 'tests/regression/test_constant_array.cpp'
1353--- tests/regression/test_constant_array.cpp 2012-08-02 14:18:02 +0000
1354+++ tests/regression/test_constant_array.cpp 2012-08-15 20:36:22 +0000
1355@@ -29,6 +29,7 @@
1356 #include <cstdio>
1357 #include <cstdlib>
1358 #include <iostream>
1359+#include <unistd.h>
1360
1361 #define WORK_ITEMS 1
1362
1363
1364=== modified file 'tests/regression/test_infinite_loop.cpp'
1365--- tests/regression/test_infinite_loop.cpp 2012-08-02 14:18:02 +0000
1366+++ tests/regression/test_infinite_loop.cpp 2012-08-15 20:36:22 +0000
1367@@ -28,6 +28,7 @@
1368 #include <cstdio>
1369 #include <cstdlib>
1370 #include <iostream>
1371+#include <unistd.h>
1372
1373 #define WORK_ITEMS 1
1374