Merge lp:~schnetter/pocl/pocl into lp:~pocl/pocl/trunk

Proposed by Erik Schnetter
Status: Merged
Merged at revision: 335
Proposed branch: lp:~schnetter/pocl/pocl
Merge into: lp:~pocl/pocl/trunk
Diff against target: 1371 lines (+573/-231)
27 files modified
INSTALL (+5/-1)
include/_kernel.h (+35/-0)
lib/CL/clEnqueueCopyBuffer.c (+18/-2)
lib/CL/clEnqueueCopyBufferRect.c (+44/-1)
lib/CL/clEnqueueNDRangeKernel.c (+2/-0)
lib/CL/clEnqueueReadBuffer.c (+15/-5)
lib/CL/clEnqueueReadBufferRect.c (+52/-12)
lib/CL/clEnqueueUnmapMemObject.c (+1/-1)
lib/CL/clEnqueueWriteBuffer.c (+12/-17)
lib/CL/clEnqueueWriteBufferRect.c (+32/-20)
lib/CL/clFinish.c (+6/-1)
lib/CL/clGetDeviceInfo.c (+21/-21)
lib/CL/clGetKernelWorkGroupInfo.c (+1/-1)
lib/CL/clReleaseProgram.c (+1/-1)
lib/CL/devices/basic/basic.h (+10/-0)
lib/CL/devices/common.h (+1/-0)
lib/CL/devices/devices.c (+2/-2)
lib/CL/devices/pthread/pocl-pthread.h (+10/-0)
lib/CL/pocl_cl.h (+13/-1)
lib/kernel/atomics.cl (+149/-0)
lib/kernel/hadd.cl (+0/-2)
lib/kernel/rhadd.cl (+0/-2)
lib/kernel/sources.mk (+140/-138)
lib/kernel/sub_sat.cl (+0/-2)
lib/llvmopencl/Makefile.am (+1/-1)
tests/regression/test_constant_array.cpp (+1/-0)
tests/regression/test_infinite_loop.cpp (+1/-0)
To merge this branch: bzr merge lp:~schnetter/pocl/pocl
Reviewer Review Type Date Requested Status
pocl maintaners Pending
Review via email: mp+119393@code.launchpad.net

Description of the change

I corrected several build errors (and some cosmetic issues) for Mac OS X.

To post a comment you must log in.
Revision history for this message
Pekka Jääskeläinen (pekka-jaaskelainen) wrote :

Fixes should be committed to the pocl/0.6 branch first and then 0.6 merged to trunk so we get fixes to the 0.6 release. Otherwise, looks good to me. Please commit to the lp:pocl/0.6 branch directly (which we can merge to trunk later). I'm probably going to roll a yet another 0.6 release candidate package later this week.

Revision history for this message
Erik Schnetter (schnetter) wrote :

I'm still relatively unfamiliar with bzr. How exactly would I "commit directly to 0.6"? I assume I check out the 0.6 branch first, and then what? Would "bzr merge -i" do the trick that allows me to select the fixes?

Should I commit my cosmetic changes as well, or should I commit those to the trunk only?

Revision history for this message
Pekka Jääskeläinen (pekka-jaaskelainen) wrote :

Yes, check out 0.6 (e.g. 'bzr co lp:pocl/0.6 pocl-0.6') then do a cherry pick merge from your branch (bzr merge -c revno ../yourbranchlocation or bzr merge -rFIRSTREV..LASTREV ../yourbranchlocation or similar) then commit (if a bound branch, just bzr commit in the 0.6 tree). Though, cherry picking book keeping is not the best in Bazaar, and IIRC the '-c' that selects a single rev is the most robust in "backporting" revisions. I cannot recall the exact problems involved but I've bumped into them in the past.

IMO code base cleanups (cosmetic) are OK to commit to 0.6 as long as we do not introduce instability/regressions.

Revision history for this message
Erik Schnetter (schnetter) wrote :

I cherry-picked the changes into a local checkout of the 0.6 branch.
However, it appears as if I only received the changes, not the
commits? I now have a large number of "locally modified" files... Is
this to be expected? How should I handle this?

-erik

On Mon, Aug 13, 2012 at 11:33 PM, Pekka Jääskeläinen
<email address hidden> wrote:
> Yes, check out 0.6 (e.g. 'bzr co lp:pocl/0.6 pocl-0.6') then do a cherry pick merge from your branch (bzr merge -c revno ../yourbranchlocation or bzr merge -rFIRSTREV..LASTREV ../yourbranchlocation or similar) then commit (if a bound branch, just bzr commit in the 0.6 tree). Though, cherry picking book keeping is not the best in Bazaar, and IIRC the '-c' that selects a single rev is the most robust in "backporting" revisions. I cannot recall the exact problems involved but I've bumped into them in the past.
>
> IMO code base cleanups (cosmetic) are OK to commit to 0.6 as long as we do not introduce instability/regressions.
> --
> https://code.launchpad.net/~schnetter/pocl/pocl/+merge/119393
> You are the owner of lp:~schnetter/pocl/pocl.

--
Erik Schnetter <email address hidden>
http://www.perimeterinstitute.ca/personal/eschnetter/

Revision history for this message
Pekka Jääskeläinen (pekka-jaaskelainen) wrote :

That's normal. The "cherry picking" of Bazaar is not perfect in this regard. http://doc.bazaar.canonical.com/latest/en/user-guide/adv_merging.html

lp:~schnetter/pocl/pocl updated
334. By Erik Schnetter

Correct several build errors for Mac OS X

Revision history for this message
Erik Schnetter (schnetter) wrote :

I committed my changes (except adding the atomics) to the 0.6 branch, then merged the branch into the trunk.

Revision history for this message
Pekka Jääskeläinen (pekka-jaaskelainen) wrote :

I can't see them in 0.6. Yep, better put the atomics to trunk, let's try to keep 0.6 in feature freeze.

lp:~schnetter/pocl/pocl updated
335. By Erik Schnetter

Implement atomics

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'INSTALL'
--- INSTALL 2012-08-08 12:45:17 +0000
+++ INSTALL 2012-08-15 20:36:22 +0000
@@ -36,10 +36,14 @@
36 ./configure --enable-shared --prefix=YOUR_INSTALLATION_PREFIX_HERE 36 ./configure --enable-shared --prefix=YOUR_INSTALLATION_PREFIX_HERE
37 make REQUIRES_RTTI=1 && make install37 make REQUIRES_RTTI=1 && make install
3838
39
40!!!NOTE: A 64-bit host is at the moment required. See39!!!NOTE: A 64-bit host is at the moment required. See
41https://bugs.launchpad.net/pocl/+bug/91191140https://bugs.launchpad.net/pocl/+bug/911911
4241
42!!!NOTE: On Mac OS X, you cannot build LLVM with LLVM; you need to
43build using gcc. Use e.g. this command to configure instead:
44
45 ./configure --enable-shared CC=gcc CXX=g++ --prefix=YOUR_INSTALLATION_PREFIX_HERE
46
43After all the requirements are installed. The installation procedure47After all the requirements are installed. The installation procedure
44follows the usual autotools build+install. If you are using a development48follows the usual autotools build+install. If you are using a development
45source tree, you need to generate the autotool build files with 49source tree, you need to generate the autotool build files with
4650
=== modified file 'include/_kernel.h'
--- include/_kernel.h 2012-05-25 16:05:51 +0000
+++ include/_kernel.h 2012-08-15 20:36:22 +0000
@@ -1608,6 +1608,41 @@
1608#endif1608#endif
16091609
16101610
16111611
1612/* Atomic operations */
1613
1614#define _CL_DECLARE_ATOMICS(MOD, TYPE) \
1615 _cl_overloadable TYPE atomic_add (volatile MOD TYPE *p, TYPE val); \
1616 _cl_overloadable TYPE atomic_sub (volatile MOD TYPE *p, TYPE val); \
1617 _cl_overloadable TYPE atomic_xchg (volatile MOD TYPE *p, TYPE val); \
1618 _cl_overloadable TYPE atomic_inc (volatile MOD TYPE *p); \
1619 _cl_overloadable TYPE atomic_dec (volatile MOD TYPE *p); \
1620 _cl_overloadable TYPE atomic_cmpxchg(volatile MOD TYPE *p, TYPE cmp, TYPE val); \
1621 _cl_overloadable TYPE atomic_min (volatile MOD TYPE *p, TYPE val); \
1622 _cl_overloadable TYPE atomic_max (volatile MOD TYPE *p, TYPE val); \
1623 _cl_overloadable TYPE atomic_and (volatile MOD TYPE *p, TYPE val); \
1624 _cl_overloadable TYPE atomic_or (volatile MOD TYPE *p, TYPE val); \
1625 _cl_overloadable TYPE atomic_xor (volatile MOD TYPE *p, TYPE val);
1626_CL_DECLARE_ATOMICS(__global, int )
1627_CL_DECLARE_ATOMICS(__global, uint)
1628_CL_DECLARE_ATOMICS(__local , int )
1629_CL_DECLARE_ATOMICS(__local , uint)
1630
1631_cl_overloadable float atomic_xchg(volatile __global float *p, float val);
1632_cl_overloadable float atomic_xchg(volatile __local float *p, float val);
1633
1634#define atom_add atomic_add
1635#define atom_sub atomic_sub
1636#define atom_xchg atomic_xchg
1637#define atom_inc atomic_inc
1638#define atom_dec atomic_dec
1639#define atom_cmpxchg atomic_cmpxchg
1640#define atom_min atomic_min
1641#define atom_max atomic_max
1642#define atom_and atomic_and
1643#define atom_or atomic_or
1644#define atom_xor atomic_xor
1645
1646
16121647
1613/* Miscellaneous Vector Functions */1648/* Miscellaneous Vector Functions */
16141649
1615// This code leads to an ICE in Clang1650// This code leads to an ICE in Clang
16161651
=== modified file 'lib/CL/clEnqueueCopyBuffer.c'
--- lib/CL/clEnqueueCopyBuffer.c 2012-05-14 11:45:48 +0000
+++ lib/CL/clEnqueueCopyBuffer.c 2012-08-15 20:36:22 +0000
@@ -23,6 +23,7 @@
23*/23*/
2424
25#include "pocl_cl.h"25#include "pocl_cl.h"
26#include "pocl_icd.h"
26#include "utlist.h"27#include "utlist.h"
27#include <assert.h>28#include <assert.h>
2829
@@ -55,14 +56,28 @@
55 return CL_INVALID_VALUE;56 return CL_INVALID_VALUE;
5657
57 device_id = command_queue->device;58 device_id = command_queue->device;
59
58 for (i = 0; i < command_queue->context->num_devices; ++i)60 for (i = 0; i < command_queue->context->num_devices; ++i)
59 {61 {
60 if (command_queue->context->devices[i] == device_id)62 if (command_queue->context->devices[i] == device_id)
61 break;63 break;
62 }64 }
63
64 assert(i < command_queue->context->num_devices);65 assert(i < command_queue->context->num_devices);
6566
67 if (event != NULL)
68 {
69 *event = (cl_event)malloc(sizeof(struct _cl_event));
70 if (*event == NULL)
71 return CL_OUT_OF_HOST_MEMORY;
72 POCL_INIT_OBJECT(*event);
73 (*event)->queue = command_queue;
74 POCL_INIT_ICD_OBJECT(*event);
75 clRetainCommandQueue (command_queue);
76
77 POCL_PROFILE_QUEUED;
78 }
79
80
66 _cl_command_node * cmd = malloc(sizeof(_cl_command_node));81 _cl_command_node * cmd = malloc(sizeof(_cl_command_node));
67 if (cmd == NULL)82 if (cmd == NULL)
68 return CL_OUT_OF_HOST_MEMORY;83 return CL_OUT_OF_HOST_MEMORY;
@@ -81,8 +96,9 @@
81 cmd->command.copy.dst_ptr = dst_buffer->device_ptrs[device_id->dev_id] + dst_offset;96 cmd->command.copy.dst_ptr = dst_buffer->device_ptrs[device_id->dev_id] + dst_offset;
82 cmd->command.copy.cb = cb;97 cmd->command.copy.cb = cb;
83 cmd->next = NULL;98 cmd->next = NULL;
99 cmd->event = event ? *event : NULL;
84100
85 LL_APPEND(command_queue->root, cmd );101 LL_APPEND(command_queue->root, cmd);
86102
87 return CL_SUCCESS;103 return CL_SUCCESS;
88}104}
89105
=== modified file 'lib/CL/clEnqueueCopyBufferRect.c'
--- lib/CL/clEnqueueCopyBufferRect.c 2012-05-14 11:45:48 +0000
+++ lib/CL/clEnqueueCopyBufferRect.c 2012-08-15 20:36:22 +0000
@@ -22,6 +22,7 @@
22*/22*/
2323
24#include "pocl_cl.h"24#include "pocl_cl.h"
25#include "pocl_icd.h"
25#include <assert.h>26#include <assert.h>
2627
27CL_API_ENTRY cl_int CL_API_CALL28CL_API_ENTRY cl_int CL_API_CALL
@@ -69,14 +70,51 @@
69 return CL_INVALID_VALUE;70 return CL_INVALID_VALUE;
7071
71 device_id = command_queue->device;72 device_id = command_queue->device;
73
72 for (i = 0; i < command_queue->context->num_devices; ++i)74 for (i = 0; i < command_queue->context->num_devices; ++i)
73 {75 {
74 if (command_queue->context->devices[i] == device_id)76 if (command_queue->context->devices[i] == device_id)
75 break;77 break;
76 }78 }
77
78 assert(i < command_queue->context->num_devices);79 assert(i < command_queue->context->num_devices);
7980
81 if (event != NULL)
82 {
83 *event = (cl_event)malloc(sizeof(struct _cl_event));
84 if (*event == NULL)
85 return CL_OUT_OF_HOST_MEMORY;
86 POCL_INIT_OBJECT(*event);
87 (*event)->queue = command_queue;
88 POCL_INIT_ICD_OBJECT(*event);
89
90 clRetainCommandQueue (command_queue);
91
92 POCL_PROFILE_QUEUED;
93 }
94
95
96 /* execute directly */
97 /* TODO: enqueue the read_rect if this is a non-blocking read (see
98 clEnqueueReadBuffer) */
99 if (command_queue->properties & CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE)
100 {
101 /* wait for the event in event_wait_list to finish */
102 POCL_ABORT_UNIMPLEMENTED();
103 }
104 else
105 {
106 /* in-order queue - all previously enqueued commands must
107 * finish before this read */
108 // ensure our buffer is not freed yet
109 clRetainMemObject (src_buffer);
110 clRetainMemObject (dst_buffer);
111 clFinish(command_queue);
112 }
113 POCL_PROFILE_SUBMITTED;
114 POCL_PROFILE_RUNNING;
115
116 /* TODO: offset computation doesn't work in case the ptr is not
117 a direct pointer */
80 device_id->copy_rect(device_id->data,118 device_id->copy_rect(device_id->data,
81 src_buffer->device_ptrs[device_id->dev_id], 119 src_buffer->device_ptrs[device_id->dev_id],
82 dst_buffer->device_ptrs[device_id->dev_id],120 dst_buffer->device_ptrs[device_id->dev_id],
@@ -84,5 +122,10 @@
84 src_row_pitch, src_slice_pitch,122 src_row_pitch, src_slice_pitch,
85 dst_row_pitch, dst_slice_pitch);123 dst_row_pitch, dst_slice_pitch);
86124
125 POCL_PROFILE_COMPLETE;
126
127 clReleaseMemObject (src_buffer);
128 clReleaseMemObject (dst_buffer);
129
87 return CL_SUCCESS;130 return CL_SUCCESS;
88}131}
89132
=== modified file 'lib/CL/clEnqueueNDRangeKernel.c'
--- lib/CL/clEnqueueNDRangeKernel.c 2012-05-30 14:10:44 +0000
+++ lib/CL/clEnqueueNDRangeKernel.c 2012-08-15 20:36:22 +0000
@@ -314,6 +314,8 @@
314 }314 }
315 }315 }
316316
317 command_node->event = event ? *event : NULL;
318
317 LL_APPEND(command_queue->root, command_node);319 LL_APPEND(command_queue->root, command_node);
318320
319 return CL_SUCCESS;321 return CL_SUCCESS;
320322
=== modified file 'lib/CL/clEnqueueReadBuffer.c'
--- lib/CL/clEnqueueReadBuffer.c 2012-05-29 12:16:56 +0000
+++ lib/CL/clEnqueueReadBuffer.c 2012-08-15 20:36:22 +0000
@@ -38,6 +38,7 @@
38 cl_event *event) CL_API_SUFFIX__VERSION_1_038 cl_event *event) CL_API_SUFFIX__VERSION_1_0
39{39{
40 cl_device_id device;40 cl_device_id device;
41 unsigned i;
4142
42 if (command_queue == NULL)43 if (command_queue == NULL)
43 return CL_INVALID_COMMAND_QUEUE;44 return CL_INVALID_COMMAND_QUEUE;
@@ -54,6 +55,13 @@
5455
55 device = command_queue->device;56 device = command_queue->device;
5657
58 for (i = 0; i < command_queue->context->num_devices; ++i)
59 {
60 if (command_queue->context->devices[i] == device)
61 break;
62 }
63 assert(i < command_queue->context->num_devices);
64
57 if (event != NULL)65 if (event != NULL)
58 {66 {
59 *event = (cl_event)malloc(sizeof(struct _cl_event));67 *event = (cl_event)malloc(sizeof(struct _cl_event));
@@ -62,14 +70,15 @@
62 POCL_INIT_OBJECT(*event);70 POCL_INIT_OBJECT(*event);
63 (*event)->queue = command_queue;71 (*event)->queue = command_queue;
64 POCL_INIT_ICD_OBJECT(*event);72 POCL_INIT_ICD_OBJECT(*event);
65
66 clRetainCommandQueue (command_queue);73 clRetainCommandQueue (command_queue);
6774
68 POCL_PROFILE_QUEUED; 75 POCL_PROFILE_QUEUED;
69 }76 }
7077
7178
72 /* enqueue the read, or execute directly */79 /* enqueue the read, or execute directly */
80 /* TODO: why do we implement both? direct execution seems
81 unnecessary. */
73 if (blocking_read)82 if (blocking_read)
74 {83 {
75 if (command_queue->properties & CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE)84 if (command_queue->properties & CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE)
@@ -101,16 +110,17 @@
101 _cl_command_node * cmd = malloc(sizeof(_cl_command_node));110 _cl_command_node * cmd = malloc(sizeof(_cl_command_node));
102 if (cmd == NULL)111 if (cmd == NULL)
103 return CL_OUT_OF_HOST_MEMORY;112 return CL_OUT_OF_HOST_MEMORY;
104 113
105 cmd->type = CL_COMMAND_TYPE_READ;114 cmd->type = CL_COMMAND_TYPE_READ;
106 cmd->command.read.data = device->data;115 cmd->command.read.data = device->data;
107 cmd->command.read.host_ptr = ptr;116 cmd->command.read.host_ptr = ptr;
108 cmd->command.read.device_ptr = buffer->device_ptrs[device->dev_id]+offset;117 cmd->command.read.device_ptr = buffer->device_ptrs[device->dev_id]+offset;
109 cmd->command.read.cb = cb;118 cmd->command.read.cb = cb;
119 cmd->command.read.buffer = buffer;
110 cmd->next = NULL;120 cmd->next = NULL;
111 cmd->command.read.buffer = buffer;121 cmd->event = event ? *event : NULL;
112 clRetainMemObject (buffer);122 clRetainMemObject (buffer);
113 LL_APPEND(command_queue->root, cmd );123 LL_APPEND(command_queue->root, cmd);
114 }124 }
115125
116 return CL_SUCCESS;126 return CL_SUCCESS;
117127
=== modified file 'lib/CL/clEnqueueReadBufferRect.c'
--- lib/CL/clEnqueueReadBufferRect.c 2012-05-25 16:05:51 +0000
+++ lib/CL/clEnqueueReadBufferRect.c 2012-08-15 20:36:22 +0000
@@ -22,6 +22,7 @@
22*/22*/
2323
24#include "pocl_cl.h"24#include "pocl_cl.h"
25#include "pocl_icd.h"
25#include <assert.h>26#include <assert.h>
26#include <stdio.h>27#include <stdio.h>
2728
@@ -41,7 +42,7 @@
41 const cl_event *event_wait_list,42 const cl_event *event_wait_list,
42 cl_event *event) CL_API_SUFFIX__VERSION_1_143 cl_event *event) CL_API_SUFFIX__VERSION_1_1
43{44{
44 cl_device_id device_id;45 cl_device_id device;
45 unsigned i;46 unsigned i;
4647
47 if (command_queue == NULL)48 if (command_queue == NULL)
@@ -58,28 +59,67 @@
58 (host_origin == NULL) ||59 (host_origin == NULL) ||
59 (region == NULL))60 (region == NULL))
60 return CL_INVALID_VALUE;61 return CL_INVALID_VALUE;
61 62
62 if ((region[0]*region[1]*region[2] > 0) &&63 if ((region[0]*region[1]*region[2] > 0) &&
63 (buffer_origin[0] + region[0]-1 +64 (buffer_origin[0] + region[0]-1 +
64 buffer_row_pitch * (buffer_origin[1] + region[1]-1) +65 buffer_row_pitch * (buffer_origin[1] + region[1]-1) +
65 buffer_slice_pitch * (buffer_origin[2] + region[2]-1) >= buffer->size))66 buffer_slice_pitch * (buffer_origin[2] + region[2]-1) >= buffer->size))
66 return CL_INVALID_VALUE;67 return CL_INVALID_VALUE;
6768
68 device_id = command_queue->device;69 device = command_queue->device;
70
69 for (i = 0; i < command_queue->context->num_devices; ++i)71 for (i = 0; i < command_queue->context->num_devices; ++i)
70 {72 {
71 if (command_queue->context->devices[i] == device_id)73 if (command_queue->context->devices[i] == device)
72 break;74 break;
73 }75 }
74
75 assert(i < command_queue->context->num_devices);76 assert(i < command_queue->context->num_devices);
7677
77 device_id->read_rect(device_id->data, ptr, 78 if (event != NULL)
78 buffer->device_ptrs[device_id->dev_id],79 {
79 buffer_origin, host_origin, region,80 *event = (cl_event)malloc(sizeof(struct _cl_event));
80 buffer_row_pitch, buffer_slice_pitch,81 if (*event == NULL)
81 host_row_pitch, host_slice_pitch);82 return CL_OUT_OF_HOST_MEMORY;
82 83 POCL_INIT_OBJECT(*event);
84 (*event)->queue = command_queue;
85 POCL_INIT_ICD_OBJECT(*event);
86
87 clRetainCommandQueue (command_queue);
88
89 POCL_PROFILE_QUEUED;
90 }
91
92
93 /* execute directly */
94 /* TODO: enqueue the read_rect if this is a non-blocking read (see
95 clEnqueueReadBuffer) */
96 if (command_queue->properties & CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE)
97 {
98 /* wait for the event in event_wait_list to finish */
99 POCL_ABORT_UNIMPLEMENTED();
100 }
101 else
102 {
103 /* in-order queue - all previously enqueued commands must
104 * finish before this read */
105 // ensure our buffer is not freed yet
106 clRetainMemObject (buffer);
107 clFinish(command_queue);
108 }
109 POCL_PROFILE_SUBMITTED;
110 POCL_PROFILE_RUNNING;
111
112 /* TODO: offset computation doesn't work in case the ptr is not
113 a direct pointer */
114 device->read_rect(device->data, ptr,
115 buffer->device_ptrs[device->dev_id],
116 buffer_origin, host_origin, region,
117 buffer_row_pitch, buffer_slice_pitch,
118 host_row_pitch, host_slice_pitch);
119
120 POCL_PROFILE_COMPLETE;
121
122 clReleaseMemObject (buffer);
83123
84 return CL_SUCCESS;124 return CL_SUCCESS;
85}125}
86126
=== modified file 'lib/CL/clEnqueueUnmapMemObject.c'
--- lib/CL/clEnqueueUnmapMemObject.c 2012-05-29 12:16:56 +0000
+++ lib/CL/clEnqueueUnmapMemObject.c 2012-08-15 20:36:22 +0000
@@ -104,6 +104,6 @@
104 DL_DELETE(memobj->mappings, mapping);104 DL_DELETE(memobj->mappings, mapping);
105 memobj->map_count--;105 memobj->map_count--;
106 clReleaseMemObject (memobj);106 clReleaseMemObject (memobj);
107 event = NULL;107
108 return CL_SUCCESS;108 return CL_SUCCESS;
109}109}
110110
=== modified file 'lib/CL/clEnqueueWriteBuffer.c'
--- lib/CL/clEnqueueWriteBuffer.c 2012-05-29 12:16:56 +0000
+++ lib/CL/clEnqueueWriteBuffer.c 2012-08-15 20:36:22 +0000
@@ -37,7 +37,7 @@
37 const cl_event *event_wait_list,37 const cl_event *event_wait_list,
38 cl_event *event) CL_API_SUFFIX__VERSION_1_038 cl_event *event) CL_API_SUFFIX__VERSION_1_0
39{39{
40 cl_device_id device_id;40 cl_device_id device;
41 unsigned i;41 unsigned i;
4242
43 if (command_queue == NULL)43 if (command_queue == NULL)
@@ -53,13 +53,13 @@
53 (offset + cb > buffer->size))53 (offset + cb > buffer->size))
54 return CL_INVALID_VALUE;54 return CL_INVALID_VALUE;
5555
56 device_id = command_queue->device;56 device = command_queue->device;
57
57 for (i = 0; i < command_queue->context->num_devices; ++i)58 for (i = 0; i < command_queue->context->num_devices; ++i)
58 {59 {
59 if (command_queue->context->devices[i] == device_id)60 if (command_queue->context->devices[i] == device)
60 break;61 break;
61 }62 }
62
63 assert(i < command_queue->context->num_devices);63 assert(i < command_queue->context->num_devices);
6464
65 if (event != NULL)65 if (event != NULL)
@@ -76,6 +76,8 @@
76 }76 }
7777
78 /* enqueue the write, or execute directly */78 /* enqueue the write, or execute directly */
79 /* TODO: why do we implement both? direct execution seems
80 unnecessary. */
79 if (blocking_write)81 if (blocking_write)
80 {82 {
81 if (command_queue->properties & CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE)83 if (command_queue->properties & CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE)
@@ -94,17 +96,9 @@
94 POCL_PROFILE_SUBMITTED;96 POCL_PROFILE_SUBMITTED;
95 POCL_PROFILE_RUNNING;97 POCL_PROFILE_RUNNING;
96 /* TODO: fixme. The offset computation must be done at the device driver. */98 /* TODO: fixme. The offset computation must be done at the device driver. */
97 device_id->write(device_id->data, ptr, buffer->device_ptrs[device_id->dev_id]+offset, cb);99 device->write(device->data, ptr, buffer->device_ptrs[device->dev_id]+offset, cb);
98 POCL_PROFILE_COMPLETE;100 POCL_PROFILE_COMPLETE;
99101
100 if (command_queue->properties & CL_QUEUE_PROFILING_ENABLE &&
101 event != NULL)
102 {
103 (*event)->status = CL_COMPLETE;
104 (*event)->time_end =
105 command_queue->device->get_timer_value(command_queue->device->data);
106 }
107
108 clReleaseMemObject (buffer);102 clReleaseMemObject (buffer);
109 }103 }
110 else104 else
@@ -112,14 +106,15 @@
112 _cl_command_node * cmd = malloc(sizeof(_cl_command_node));106 _cl_command_node * cmd = malloc(sizeof(_cl_command_node));
113 if (cmd == NULL)107 if (cmd == NULL)
114 return CL_OUT_OF_HOST_MEMORY;108 return CL_OUT_OF_HOST_MEMORY;
115 109
116 cmd->type = CL_COMMAND_TYPE_WRITE;110 cmd->type = CL_COMMAND_TYPE_WRITE;
117 cmd->command.write.data = device_id->data;111 cmd->command.write.data = device->data;
118 cmd->command.write.host_ptr = ptr;112 cmd->command.write.host_ptr = ptr;
119 cmd->command.write.device_ptr = buffer->device_ptrs[i]+offset;113 cmd->command.write.device_ptr = buffer->device_ptrs[i]+offset;
120 cmd->command.write.cb = cb;114 cmd->command.write.cb = cb;
115 cmd->command.write.buffer = buffer;
121 cmd->next = NULL;116 cmd->next = NULL;
122 cmd->command.write.buffer = buffer;117 cmd->event = event ? *event : NULL;
123 clRetainMemObject (buffer);118 clRetainMemObject (buffer);
124119
125 LL_APPEND(command_queue->root, cmd);120 LL_APPEND(command_queue->root, cmd);
126121
=== modified file 'lib/CL/clEnqueueWriteBufferRect.c'
--- lib/CL/clEnqueueWriteBufferRect.c 2012-05-14 11:45:48 +0000
+++ lib/CL/clEnqueueWriteBufferRect.c 2012-08-15 20:36:22 +0000
@@ -40,7 +40,7 @@
40 const cl_event *event_wait_list,40 const cl_event *event_wait_list,
41 cl_event *event) CL_API_SUFFIX__VERSION_1_141 cl_event *event) CL_API_SUFFIX__VERSION_1_1
42{42{
43 cl_device_id device_id;43 cl_device_id device;
44 unsigned i;44 unsigned i;
4545
46 if (command_queue == NULL)46 if (command_queue == NULL)
@@ -57,43 +57,55 @@
57 (host_origin == NULL) ||57 (host_origin == NULL) ||
58 (region == NULL))58 (region == NULL))
59 return CL_INVALID_VALUE;59 return CL_INVALID_VALUE;
60 60
61 if ((region[0]*region[1]*region[2] > 0) &&61 if ((region[0]*region[1]*region[2] > 0) &&
62 (buffer_origin[0] + region[0]-1 +62 (buffer_origin[0] + region[0]-1 +
63 buffer_row_pitch * (buffer_origin[1] + region[1]-1) +63 buffer_row_pitch * (buffer_origin[1] + region[1]-1) +
64 buffer_slice_pitch * (buffer_origin[2] + region[2]-1) >= buffer->size))64 buffer_slice_pitch * (buffer_origin[2] + region[2]-1) >= buffer->size))
65 {65 {
66 POCL_ABORT_UNIMPLEMENTED();66 POCL_ABORT_UNIMPLEMENTED();
67#if 0
68 printf("bo=[%d,%d,%d]\n"
69 "ho=[%d,%d,%d]\n"
70 "re=[%d,%d,%d]\n"
71 "bp=[,%d,%d]\n"
72 "hp=[,%d,%d]\n"
73 "bs=[%d]\n",
74 (int)buffer_origin[0], (int)buffer_origin[1], (int)buffer_origin[2],
75 (int)host_origin[0], (int)host_origin[1], (int)host_origin[2],
76 (int)region[0], (int)region[1], (int)region[2],
77 (int)buffer_row_pitch, (int)buffer_slice_pitch,
78 (int)host_row_pitch, (int)host_slice_pitch,
79 (int)buffer->size);
80#endif
81 return CL_INVALID_VALUE;67 return CL_INVALID_VALUE;
82 }68 }
8369
84 device_id = command_queue->device;70 device = command_queue->device;
71
85 for (i = 0; i < command_queue->context->num_devices; ++i)72 for (i = 0; i < command_queue->context->num_devices; ++i)
86 {73 {
87 if (command_queue->context->devices[i] == device_id)74 if (command_queue->context->devices[i] == device)
88 break;75 break;
89 }76 }
90
91 assert(i < command_queue->context->num_devices);77 assert(i < command_queue->context->num_devices);
9278
93 device_id->write_rect(device_id->data, ptr, buffer->device_ptrs[device_id->dev_id],79
80 /* execute directly */
81 /* TODO: enqueue the write_rect if this is a non-blocking read (see
82 clEnqueueWriteBuffer) */
83 if (command_queue->properties & CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE)
84 {
85 /* wait for the event in event_wait_list to finish */
86 POCL_ABORT_UNIMPLEMENTED();
87 }
88 else
89 {
90 /* in-order queue - all previously enqueued commands must
91 * finish before this read */
92 // ensure our buffer is not freed yet
93 clRetainMemObject (buffer);
94 clFinish(command_queue);
95 }
96 POCL_PROFILE_SUBMITTED;
97 POCL_PROFILE_RUNNING;
98
99 /* TODO: offset computation doesn't work in case the ptr is not
100 a direct pointer */
101 device->write_rect(device->data, ptr, buffer->device_ptrs[device->dev_id],
94 buffer_origin, host_origin, region,102 buffer_origin, host_origin, region,
95 buffer_row_pitch, buffer_slice_pitch,103 buffer_row_pitch, buffer_slice_pitch,
96 host_row_pitch, host_slice_pitch);104 host_row_pitch, host_slice_pitch);
97105
106 POCL_PROFILE_COMPLETE;
107
108 clReleaseMemObject (buffer);
109
98 return CL_SUCCESS;110 return CL_SUCCESS;
99}111}
100112
=== modified file 'lib/CL/clFinish.c'
--- lib/CL/clFinish.c 2012-05-31 12:01:27 +0000
+++ lib/CL/clFinish.c 2012-08-15 20:36:22 +0000
@@ -41,6 +41,7 @@
41 {41 {
42 case CL_COMMAND_TYPE_READ:42 case CL_COMMAND_TYPE_READ:
43 POCL_PROFILE_SUBMITTED;43 POCL_PROFILE_SUBMITTED;
44 POCL_PROFILE_RUNNING;
44 command_queue->device->read45 command_queue->device->read
45 (node->command.read.data, 46 (node->command.read.data,
46 node->command.read.host_ptr, 47 node->command.read.host_ptr,
@@ -51,6 +52,7 @@
51 break;52 break;
52 case CL_COMMAND_TYPE_WRITE:53 case CL_COMMAND_TYPE_WRITE:
53 POCL_PROFILE_SUBMITTED;54 POCL_PROFILE_SUBMITTED;
55 POCL_PROFILE_RUNNING;
54 command_queue->device->write56 command_queue->device->write
55 (node->command.write.data, 57 (node->command.write.data,
56 node->command.write.host_ptr, 58 node->command.write.host_ptr,
@@ -61,6 +63,7 @@
61 break;63 break;
62 case CL_COMMAND_TYPE_COPY:64 case CL_COMMAND_TYPE_COPY:
63 POCL_PROFILE_SUBMITTED;65 POCL_PROFILE_SUBMITTED;
66 POCL_PROFILE_RUNNING;
64 command_queue->device->copy67 command_queue->device->copy
65 (node->command.copy.data, 68 (node->command.copy.data,
66 node->command.copy.src_ptr, 69 node->command.copy.src_ptr,
@@ -71,9 +74,11 @@
71 clReleaseMemObject (node->command.copy.dst_buffer);74 clReleaseMemObject (node->command.copy.dst_buffer);
72 break;75 break;
73 case CL_COMMAND_TYPE_RUN:76 case CL_COMMAND_TYPE_RUN:
74 POCL_PROFILE_SUBMITTED;
75 assert (*event == node->event);77 assert (*event == node->event);
78 POCL_PROFILE_SUBMITTED;
79 POCL_PROFILE_RUNNING;
76 command_queue->device->run(node->command.run.data, node);80 command_queue->device->run(node->command.run.data, node);
81 POCL_PROFILE_COMPLETE;
77 for (i = 0; i < node->command.run.arg_buffer_count; ++i)82 for (i = 0; i < node->command.run.arg_buffer_count; ++i)
78 {83 {
79 cl_mem buf = node->command.run.arg_buffers[i];84 cl_mem buf = node->command.run.arg_buffers[i];
8085
=== modified file 'lib/CL/clGetDeviceInfo.c'
--- lib/CL/clGetDeviceInfo.c 2012-08-08 12:45:17 +0000
+++ lib/CL/clGetDeviceInfo.c 2012-08-15 20:36:22 +0000
@@ -126,23 +126,17 @@
126 }126 }
127 127
128 case CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR:128 case CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR:
129 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_char); 129 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_char);
130
131 case CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT:130 case CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT:
132 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_short);131 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_short);
133
134 case CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT:132 case CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT:
135 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_int);133 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_int);
136
137 case CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG:134 case CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG:
138 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_long);135 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_long);
139
140 case CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT:136 case CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT:
141 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_float);137 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_float);
142
143 case CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE:138 case CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE:
144 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_double);139 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_double);
145
146 case CL_DEVICE_MAX_CLOCK_FREQUENCY :140 case CL_DEVICE_MAX_CLOCK_FREQUENCY :
147 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->max_clock_frequency);141 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->max_clock_frequency);
148 case CL_DEVICE_ADDRESS_BITS :142 case CL_DEVICE_ADDRESS_BITS :
@@ -186,7 +180,7 @@
186 case CL_DEVICE_MAX_CONSTANT_ARGS : 180 case CL_DEVICE_MAX_CONSTANT_ARGS :
187 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->max_constant_args);181 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->max_constant_args);
188 case CL_DEVICE_LOCAL_MEM_TYPE :182 case CL_DEVICE_LOCAL_MEM_TYPE :
189 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->local_mem_size);183 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->local_mem_type);
190 case CL_DEVICE_LOCAL_MEM_SIZE:184 case CL_DEVICE_LOCAL_MEM_SIZE:
191 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_ulong, device->local_mem_size);185 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_ulong, device->local_mem_size);
192 case CL_DEVICE_ERROR_CORRECTION_SUPPORT :186 case CL_DEVICE_ERROR_CORRECTION_SUPPORT :
@@ -218,28 +212,34 @@
218212
219 case CL_DEVICE_EXTENSIONS : 213 case CL_DEVICE_EXTENSIONS :
220 POCL_RETURN_DEVICE_INFO_STR("cl_khr_fp16");214 POCL_RETURN_DEVICE_INFO_STR("cl_khr_fp16");
221 case CL_DEVICE_PLATFORM : break;215 case CL_DEVICE_PLATFORM :
222 case CL_DEVICE_DOUBLE_FP_CONFIG : break;216 {
217 /* Return the first platform id, assuming this is the only
218 platform id (which is currently always the case for pocl) */
219 cl_platform_id platform_id;
220 clGetPlatformIDs(1, &platform_id, NULL);
221 POCL_RETURN_DEVICE_INFO(cl_platform_id, platform_id);
222 }
223 case CL_DEVICE_DOUBLE_FP_CONFIG :
224 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_ulong, device->double_fp_config);
223 case CL_DEVICE_HALF_FP_CONFIG : break;225 case CL_DEVICE_HALF_FP_CONFIG : break;
224 case CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF : break;226 case CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF :
227 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_half);
225 case CL_DEVICE_HOST_UNIFIED_MEMORY : break;228 case CL_DEVICE_HOST_UNIFIED_MEMORY : break;
226 /* TODO: figure out what the difference between preferred and native
227 widths are. And why there is no struct fields 'native_vector...' */
228 case CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR : 229 case CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR :
229 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_char);230 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->native_vector_width_char);
230 case CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT :231 case CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT :
231 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_short);232 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->native_vector_width_short);
232 case CL_DEVICE_NATIVE_VECTOR_WIDTH_INT : 233 case CL_DEVICE_NATIVE_VECTOR_WIDTH_INT :
233 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_int);234 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->native_vector_width_int);
234 case CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG : 235 case CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG :
235 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_long);236 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->native_vector_width_long);
236 case CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT : 237 case CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT :
237 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_float);238 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->native_vector_width_float);
238 case CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE : 239 case CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE :
239 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_double);240 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->native_vector_width_double);
240 case CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF : 241 case CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF :
241 /* TODO: why is there no preferred_vector_width_half? */242 POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->native_vector_width_half);
242 POCL_RETURN_DEVICE_INFO(cl_uint, 0);
243 case CL_DEVICE_OPENCL_C_VERSION :243 case CL_DEVICE_OPENCL_C_VERSION :
244 POCL_RETURN_DEVICE_INFO_STR("1.2");244 POCL_RETURN_DEVICE_INFO_STR("1.2");
245 }245 }
246246
=== modified file 'lib/CL/clGetKernelWorkGroupInfo.c'
--- lib/CL/clGetKernelWorkGroupInfo.c 2012-03-28 11:25:45 +0000
+++ lib/CL/clGetKernelWorkGroupInfo.c 2012-08-15 20:36:22 +0000
@@ -65,7 +65,7 @@
65 POCL_ABORT_UNIMPLEMENTED();65 POCL_ABORT_UNIMPLEMENTED();
66 66
67 case CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE:67 case CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE:
68 POCL_RETURN_KERNEL_WG_INFO(size_t, device->preferred_wg_size_multiple); 68 POCL_RETURN_KERNEL_WG_INFO(size_t, device->preferred_wg_size_multiple);
69 69
70 case CL_KERNEL_LOCAL_MEM_SIZE:70 case CL_KERNEL_LOCAL_MEM_SIZE:
71 POCL_RETURN_KERNEL_WG_INFO(cl_ulong, device->local_mem_size);71 POCL_RETURN_KERNEL_WG_INFO(cl_ulong, device->local_mem_size);
7272
=== modified file 'lib/CL/clReleaseProgram.c'
--- lib/CL/clReleaseProgram.c 2012-05-31 12:01:27 +0000
+++ lib/CL/clReleaseProgram.c 2012-08-15 20:36:22 +0000
@@ -58,7 +58,7 @@
58 free (program->binary_sizes);58 free (program->binary_sizes);
5959
60 env = getenv ("POCL_LEAVE_TEMP_DIRS");60 env = getenv ("POCL_LEAVE_TEMP_DIRS");
61 if (!(env != NULL && strlen (env) == 1 && env[0] == '1') &&61 if (!(env != NULL && strcmp (env, "1") == 0) &&
62 getenv("POCL_TEMP_DIR") == NULL)62 getenv("POCL_TEMP_DIR") == NULL)
63 {63 {
64 remove_directory (program->temp_dir);64 remove_directory (program->temp_dir);
6565
=== modified file 'lib/CL/devices/basic/basic.h'
--- lib/CL/devices/basic/basic.h 2012-08-08 14:54:32 +0000
+++ lib/CL/devices/basic/basic.h 2012-08-15 20:36:22 +0000
@@ -46,6 +46,15 @@
46 POCL_DEVICES_PREFERRED_VECTOR_WIDTH_LONG , /* preferred_vector_width_long */ \46 POCL_DEVICES_PREFERRED_VECTOR_WIDTH_LONG , /* preferred_vector_width_long */ \
47 POCL_DEVICES_PREFERRED_VECTOR_WIDTH_FLOAT , /* preferred_vector_width_float */ \47 POCL_DEVICES_PREFERRED_VECTOR_WIDTH_FLOAT , /* preferred_vector_width_float */ \
48 POCL_DEVICES_PREFERRED_VECTOR_WIDTH_DOUBLE, /* preferred_vector_width_double */ \48 POCL_DEVICES_PREFERRED_VECTOR_WIDTH_DOUBLE, /* preferred_vector_width_double */ \
49 POCL_DEVICES_PREFERRED_VECTOR_WIDTH_HALF , /* preferred_vector_width_half */ \
50 /* TODO: figure out what the difference between preferred and native widths are. */ \
51 POCL_DEVICES_PREFERRED_VECTOR_WIDTH_CHAR , /* preferred_vector_width_char */ \
52 POCL_DEVICES_PREFERRED_VECTOR_WIDTH_SHORT , /* preferred_vector_width_short */ \
53 POCL_DEVICES_PREFERRED_VECTOR_WIDTH_INT , /* preferred_vector_width_int */ \
54 POCL_DEVICES_PREFERRED_VECTOR_WIDTH_LONG , /* preferred_vector_width_long */ \
55 POCL_DEVICES_PREFERRED_VECTOR_WIDTH_FLOAT , /* preferred_vector_width_float */ \
56 POCL_DEVICES_PREFERRED_VECTOR_WIDTH_DOUBLE, /* preferred_vector_width_double */ \
57 POCL_DEVICES_PREFERRED_VECTOR_WIDTH_HALF , /* preferred_vector_width_half */ \
49 0, /* max_clock_frequency */ \58 0, /* max_clock_frequency */ \
50 0, /* address_bits */ \59 0, /* address_bits */ \
51 0, /* max_mem_alloc_size */ \60 0, /* max_mem_alloc_size */ \
@@ -62,6 +71,7 @@
62 0, /* mem_base_addr_align */ \71 0, /* mem_base_addr_align */ \
63 0, /* min_data_type_align_size */ \72 0, /* min_data_type_align_size */ \
64 CL_FP_ROUND_TO_NEAREST | CL_FP_INF_NAN, /* single_fp_config */ \73 CL_FP_ROUND_TO_NEAREST | CL_FP_INF_NAN, /* single_fp_config */ \
74 CL_FP_ROUND_TO_NEAREST | CL_FP_INF_NAN, /* double_fp_config */ \
65 CL_NONE, /* global_mem_cache_type */ \75 CL_NONE, /* global_mem_cache_type */ \
66 0, /* global_mem_cacheline_size */ \76 0, /* global_mem_cacheline_size */ \
67 0, /* global_mem_cache_size */ \77 0, /* global_mem_cache_size */ \
6878
=== modified file 'lib/CL/devices/common.h'
--- lib/CL/devices/common.h 2012-04-24 13:12:25 +0000
+++ lib/CL/devices/common.h 2012-08-15 20:36:22 +0000
@@ -70,5 +70,6 @@
70#endif70#endif
71/* Half is internally represented as short */71/* Half is internally represented as short */
72#define POCL_DEVICES_PREFERRED_VECTOR_WIDTH_HALF POCL_DEVICES_PREFERRED_VECTOR_WIDTH_SHORT72#define POCL_DEVICES_PREFERRED_VECTOR_WIDTH_HALF POCL_DEVICES_PREFERRED_VECTOR_WIDTH_SHORT
73#define POCL_DEVICES_NATIVE_VECTOR_WIDTH_HALF POCL_DEVICES_NATIVE_VECTOR_WIDTH_SHORT
7374
74#endif75#endif
7576
=== modified file 'lib/CL/devices/devices.c'
--- lib/CL/devices/devices.c 2012-05-31 12:01:27 +0000
+++ lib/CL/devices/devices.c 2012-08-15 20:36:22 +0000
@@ -57,7 +57,7 @@
57pocl_init_devices()57pocl_init_devices()
58{58{
59 const char *device_list;59 const char *device_list;
60 char *ptr, *tofree, *token, *saveptr, *saveptr2;60 char *ptr, *tofree, *token, *saveptr;
61 int i, devcount;61 int i, devcount;
62 if (pocl_num_devices > 0)62 if (pocl_num_devices > 0)
63 return;63 return;
@@ -83,7 +83,7 @@
8383
84 ptr = tofree = strdup(device_list);84 ptr = tofree = strdup(device_list);
85 devcount = 0;85 devcount = 0;
86 while ((token = strtok_r (ptr, " ", &saveptr2)) != NULL)86 while ((token = strtok_r (ptr, " ", &saveptr)) != NULL)
87 {87 {
88 struct _cl_device_id* device_type = NULL;88 struct _cl_device_id* device_type = NULL;
8989
9090
=== modified file 'lib/CL/devices/pthread/pocl-pthread.h'
--- lib/CL/devices/pthread/pocl-pthread.h 2012-08-08 14:54:32 +0000
+++ lib/CL/devices/pthread/pocl-pthread.h 2012-08-15 20:36:22 +0000
@@ -50,6 +50,15 @@
50 POCL_DEVICES_PREFERRED_VECTOR_WIDTH_LONG , /* preferred_vector_width_long */ \50 POCL_DEVICES_PREFERRED_VECTOR_WIDTH_LONG , /* preferred_vector_width_long */ \
51 POCL_DEVICES_PREFERRED_VECTOR_WIDTH_FLOAT , /* preferred_vector_width_float */ \51 POCL_DEVICES_PREFERRED_VECTOR_WIDTH_FLOAT , /* preferred_vector_width_float */ \
52 POCL_DEVICES_PREFERRED_VECTOR_WIDTH_DOUBLE, /* preferred_vector_width_double */ \52 POCL_DEVICES_PREFERRED_VECTOR_WIDTH_DOUBLE, /* preferred_vector_width_double */ \
53 POCL_DEVICES_PREFERRED_VECTOR_WIDTH_HALF, /* preferred_vector_width_half */ \
54 /* TODO: figure out what the difference between preferred and native widths are. */ \
55 POCL_DEVICES_PREFERRED_VECTOR_WIDTH_CHAR , /* preferred_vector_width_char */ \
56 POCL_DEVICES_PREFERRED_VECTOR_WIDTH_SHORT , /* preferred_vector_width_short */ \
57 POCL_DEVICES_PREFERRED_VECTOR_WIDTH_INT , /* preferred_vector_width_int */ \
58 POCL_DEVICES_PREFERRED_VECTOR_WIDTH_LONG , /* preferred_vector_width_long */ \
59 POCL_DEVICES_PREFERRED_VECTOR_WIDTH_FLOAT , /* preferred_vector_width_float */ \
60 POCL_DEVICES_PREFERRED_VECTOR_WIDTH_DOUBLE, /* preferred_vector_width_double */ \
61 POCL_DEVICES_PREFERRED_VECTOR_WIDTH_HALF , /* preferred_vector_width_half */ \
53 0, /* max_clock_frequency */ \62 0, /* max_clock_frequency */ \
54 0, /* address_bits */ \63 0, /* address_bits */ \
55 0, /* max_mem_alloc_size */ \64 0, /* max_mem_alloc_size */ \
@@ -66,6 +75,7 @@
66 0, /* mem_base_addr_align */ \75 0, /* mem_base_addr_align */ \
67 0, /* min_data_type_align_size */ \76 0, /* min_data_type_align_size */ \
68 CL_FP_ROUND_TO_NEAREST | CL_FP_INF_NAN, /* single_fp_config */ \77 CL_FP_ROUND_TO_NEAREST | CL_FP_INF_NAN, /* single_fp_config */ \
78 CL_FP_ROUND_TO_NEAREST | CL_FP_INF_NAN, /* double_fp_config */ \
69 CL_NONE, /* global_mem_cache_type */ \79 CL_NONE, /* global_mem_cache_type */ \
70 0, /* global_mem_cacheline_size */ \80 0, /* global_mem_cacheline_size */ \
71 0, /* global_mem_cache_size */ \81 0, /* global_mem_cache_size */ \
7282
=== modified file 'lib/CL/pocl_cl.h'
--- lib/CL/pocl_cl.h 2012-08-08 14:54:32 +0000
+++ lib/CL/pocl_cl.h 2012-08-15 20:36:22 +0000
@@ -25,6 +25,7 @@
25#define POCL_CL_H25#define POCL_CL_H
2626
27#include "config.h"27#include "config.h"
28#include <assert.h>
28#include <stdio.h>29#include <stdio.h>
29#include <ltdl.h>30#include <ltdl.h>
30#include <pthread.h>31#include <pthread.h>
@@ -150,6 +151,14 @@
150 cl_uint preferred_vector_width_long;151 cl_uint preferred_vector_width_long;
151 cl_uint preferred_vector_width_float;152 cl_uint preferred_vector_width_float;
152 cl_uint preferred_vector_width_double;153 cl_uint preferred_vector_width_double;
154 cl_uint preferred_vector_width_half;
155 cl_uint native_vector_width_char;
156 cl_uint native_vector_width_short;
157 cl_uint native_vector_width_int;
158 cl_uint native_vector_width_long;
159 cl_uint native_vector_width_float;
160 cl_uint native_vector_width_double;
161 cl_uint native_vector_width_half;
153 cl_uint max_clock_frequency;162 cl_uint max_clock_frequency;
154 cl_uint address_bits;163 cl_uint address_bits;
155 cl_ulong max_mem_alloc_size;164 cl_ulong max_mem_alloc_size;
@@ -166,6 +175,7 @@
166 cl_uint mem_base_addr_align;175 cl_uint mem_base_addr_align;
167 cl_uint min_data_type_align_size;176 cl_uint min_data_type_align_size;
168 cl_device_fp_config single_fp_config;177 cl_device_fp_config single_fp_config;
178 cl_device_fp_config double_fp_config;
169 cl_device_mem_cache_type global_mem_cache_type;179 cl_device_mem_cache_type global_mem_cache_type;
170 cl_uint global_mem_cacheline_size;180 cl_uint global_mem_cacheline_size;
171 cl_ulong global_mem_cache_size;181 cl_ulong global_mem_cache_size;
@@ -394,12 +404,12 @@
394 } \404 } \
395 } while (0) \405 } while (0) \
396406
397
398#define POCL_PROFILE_SUBMITTED \407#define POCL_PROFILE_SUBMITTED \
399 do { \408 do { \
400 if (command_queue->properties & CL_QUEUE_PROFILING_ENABLE && \409 if (command_queue->properties & CL_QUEUE_PROFILING_ENABLE && \
401 event != NULL && (*event) != NULL) \410 event != NULL && (*event) != NULL) \
402 { \411 { \
412 assert((*event)->status = CL_QUEUED); \
403 (*event)->status = CL_SUBMITTED; \413 (*event)->status = CL_SUBMITTED; \
404 (*event)->time_submit = \414 (*event)->time_submit = \
405 command_queue->device->get_timer_value(command_queue->device->data); \415 command_queue->device->get_timer_value(command_queue->device->data); \
@@ -411,6 +421,7 @@
411 if (command_queue->properties & CL_QUEUE_PROFILING_ENABLE && \421 if (command_queue->properties & CL_QUEUE_PROFILING_ENABLE && \
412 event != NULL && (*event) != NULL) \422 event != NULL && (*event) != NULL) \
413 { \423 { \
424 assert((*event)->status = CL_SUBMITTED); \
414 (*event)->status = CL_RUNNING; \425 (*event)->status = CL_RUNNING; \
415 (*event)->time_start = \426 (*event)->time_start = \
416 command_queue->device->get_timer_value(command_queue->device->data); \427 command_queue->device->get_timer_value(command_queue->device->data); \
@@ -422,6 +433,7 @@
422 if (command_queue->properties & CL_QUEUE_PROFILING_ENABLE && \433 if (command_queue->properties & CL_QUEUE_PROFILING_ENABLE && \
423 event != NULL && (*event) != NULL) \434 event != NULL && (*event) != NULL) \
424 { \435 { \
436 assert((*event)->status = CL_RUNNING); \
425 (*event)->status = CL_COMPLETE; \437 (*event)->status = CL_COMPLETE; \
426 (*event)->time_end = \438 (*event)->time_end = \
427 command_queue->device->get_timer_value(command_queue->device->data); \439 command_queue->device->get_timer_value(command_queue->device->data); \
428440
=== added file 'lib/kernel/atomics.cl'
--- lib/kernel/atomics.cl 1970-01-01 00:00:00 +0000
+++ lib/kernel/atomics.cl 2012-08-15 20:36:22 +0000
@@ -0,0 +1,149 @@
1/* OpenCL built-in library: atomic operations
2
3 Copyright (c) 2012 Universidad Rey Juan Carlos
4
5 Permission is hereby granted, free of charge, to any person obtaining a copy
6 of this software and associated documentation files (the "Software"), to deal
7 in the Software without restriction, including without limitation the rights
8 to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 copies of the Software, and to permit persons to whom the Software is
10 furnished to do so, subject to the following conditions:
11
12 The above copyright notice and this permission notice shall be included in
13 all copies or substantial portions of the Software.
14
15 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21 THE SOFTWARE.
22*/
23
24
25
26// Repeat the content of this file several times with different values
27// for Q, T, and U:
28#if !defined(Q)
29
30# define Q __global
31# include "atomics.cl"
32# undef Q
33
34# define Q __local
35# include "atomics.cl"
36# undef Q
37
38#elif !defined(T)
39
40# define T int
41# define MIN __sync_fetch_and_min
42# define MAX __sync_fetch_and_max
43# include "atomics.cl"
44# undef T
45# undef MIN
46# undef MAX
47
48# define T uint
49# define MIN __sync_fetch_and_umin
50# define MAX __sync_fetch_and_umax
51# include "atomics.cl"
52# undef T
53# undef MIN
54# undef MAX
55
56
57// xchg is also supported for float as a special case
58__attribute__((overloadable))
59float atomic_xchg(volatile Q float *p, float val)
60{
61 // NOTE: We compare the float as int here...
62 return __atomic_exchange_n((volatile int*)p, val, __ATOMIC_RELAXED);
63}
64
65#else
66
67
68
69// basic
70
71// read, add, store
72__attribute__((overloadable))
73T atomic_add(volatile Q T *p, T val)
74{
75 return __sync_fetch_and_add((volatile T*)p, val, __ATOMIC_RELAXED);
76}
77
78// read, subtract, store
79__attribute__((overloadable))
80T atomic_sub(volatile Q T *p, T val)
81{
82 return __sync_fetch_and_sub(p, val, __ATOMIC_RELAXED);
83}
84
85// read, swap, store
86__attribute__((overloadable))
87T atomic_xchg(volatile Q T *p, T val)
88{
89 return __atomic_exchange_n(p, val, __ATOMIC_RELAXED);
90}
91
92// read, increment, store
93__attribute__((overloadable))
94T atomic_inc(volatile Q T *p)
95{
96 return atomic_add(p, (T)1);
97}
98
99// read, decrement, store
100__attribute__((overloadable))
101T atomic_dec(volatile Q T *p)
102{
103 return atomic_sub(p, (T)1);
104}
105
106// read, store
107__attribute__((overloadable))
108T atomic_cmpxchg(volatile Q T *p, T cmp, T val)
109{
110 __atomic_compare_exchange_n(p, &cmp, val, false,
111 __ATOMIC_RELAXED, __ATOMIC_RELAXED);
112 return cmp;
113}
114
115// extended
116
117__attribute__((overloadable))
118T atomic_min(volatile Q T *p, T val)
119{
120 return MIN((volatile T*)p, val);
121}
122
123__attribute__((overloadable))
124T atomic_max(volatile Q T *p, T val)
125{
126 return MAX((volatile T*)p, val);
127}
128
129__attribute__((overloadable))
130T atomic_and(volatile Q T *p, T val)
131{
132 return __sync_fetch_and_and(p, val, __ATOMIC_RELAXED);
133}
134
135__attribute__((overloadable))
136T atomic_or(volatile Q T *p, T val)
137{
138 return __sync_fetch_and_or(p, val, __ATOMIC_RELAXED);
139}
140
141__attribute__((overloadable))
142T atomic_xor(volatile Q T *p, T val)
143{
144 return __sync_fetch_and_xor(p, val, __ATOMIC_RELAXED);
145}
146
147
148
149#endif
0150
=== modified file 'lib/kernel/hadd.cl'
--- lib/kernel/hadd.cl 2011-10-26 13:03:37 +0000
+++ lib/kernel/hadd.cl 2012-08-15 20:36:22 +0000
@@ -23,7 +23,5 @@
2323
24#include "templates.h"24#include "templates.h"
2525
26// This could do with some testing
27// This could probably also be optimised (i.e. the ?: operators eliminated)
28DEFINE_EXPR_G_GG(hadd,26DEFINE_EXPR_G_GG(hadd,
29 (a >> (sgtype)1) + (b >> (sgtype)1) + (a & b & (gtype)1))27 (a >> (sgtype)1) + (b >> (sgtype)1) + (a & b & (gtype)1))
3028
=== modified file 'lib/kernel/rhadd.cl'
--- lib/kernel/rhadd.cl 2011-10-26 13:03:37 +0000
+++ lib/kernel/rhadd.cl 2012-08-15 20:36:22 +0000
@@ -23,7 +23,5 @@
2323
24#include "templates.h"24#include "templates.h"
2525
26// This could do with some testing
27// This could probably also be optimised (i.e. the ?: operators eliminated)
28DEFINE_EXPR_G_GG(rhadd,26DEFINE_EXPR_G_GG(rhadd,
29 (a >> (sgtype)1) + (b >> (sgtype)1) + ((a | b) & (gtype)1))27 (a >> (sgtype)1) + (b >> (sgtype)1) + ((a | b) & (gtype)1))
3028
=== modified file 'lib/kernel/sources.mk'
--- lib/kernel/sources.mk 2012-06-04 12:15:18 +0000
+++ lib/kernel/sources.mk 2012-08-15 20:36:22 +0000
@@ -1,144 +1,146 @@
1# Nodist here because these files should be included1# Nodist here because these files should be included
2# to the distribution only once, from the root kernel2# to the distribution only once, from the root kernel
3# makefile.3# makefile.
4nodist_libkernel_a_SOURCES = templates.h \4nodist_libkernel_a_SOURCES = \
5 barrier.ll \5 templates.h \
6 image.h \6 barrier.ll \
7 get_work_dim.c \7 image.h \
8 get_global_size.c \8 get_work_dim.c \
9 get_global_id.c \9 get_global_size.c \
10 get_local_size.c \10 get_global_id.c \
11 get_local_id.c \11 get_local_size.c \
12 get_num_groups.c \12 get_local_id.c \
13 get_group_id.c \13 get_num_groups.c \
14 get_global_offset.c \14 get_group_id.c \
15 as_type.cl \15 get_global_offset.c \
16 convert_type.cl \16 as_type.cl \
17 acos.cl \17 atomics.cl \
18 acosh.cl \18 convert_type.cl \
19 acospi.cl \19 acos.cl \
20 asin.cl \20 acosh.cl \
21 asinh.cl \21 acospi.cl \
22 asinpi.cl \22 asin.cl \
23 atan.cl \23 asinh.cl \
24 atan2.cl \24 asinpi.cl \
25 atan2pi.cl \25 atan.cl \
26 atanh.cl \26 atan2.cl \
27 atanpi.cl \27 atan2pi.cl \
28 cbrt.cl \28 atanh.cl \
29 ceil.cl \29 atanpi.cl \
30 copysign.cl \30 cbrt.cl \
31 cos.cl \31 ceil.cl \
32 cosh.cl \32 copysign.cl \
33 cospi.cl \33 cos.cl \
34 erfc.cl \34 cosh.cl \
35 erf.cl \35 cospi.cl \
36 exp.cl \36 erfc.cl \
37 exp2.cl \37 erf.cl \
38 exp10.cl \38 exp.cl \
39 expm1.cl \39 exp2.cl \
40 fabs.cl \40 exp10.cl \
41 fdim.cl \41 expm1.cl \
42 floor.cl \42 fabs.cl \
43 fma.cl \43 fdim.cl \
44 fmax.cl \44 floor.cl \
45 fmin.cl \45 fma.cl \
46 fmod.cl \46 fmax.cl \
47 fract.cl \47 fmin.cl \
48 hypot.cl \48 fmod.cl \
49 ilogb.cl \49 fract.cl \
50 ldexp.cl \50 hypot.cl \
51 lgamma.cl \51 ilogb.cl \
52 log.cl \52 ldexp.cl \
53 log2.cl \53 lgamma.cl \
54 log10.cl \54 log.cl \
55 log1p.cl \55 log2.cl \
56 logb.cl \56 log10.cl \
57 mad.cl \57 log1p.cl \
58 maxmag.cl \58 logb.cl \
59 minmag.cl \59 mad.cl \
60 nan.cl \60 maxmag.cl \
61 nextafter.cl \61 minmag.cl \
62 pow.cl \62 nan.cl \
63 pown.cl \63 nextafter.cl \
64 powr.cl \64 pow.cl \
65 remainder.cl \65 pown.cl \
66 rint.cl \66 powr.cl \
67 rootn.cl \67 remainder.cl \
68 round.cl \68 rint.cl \
69 rsqrt.cl \69 rootn.cl \
70 sin.cl \70 round.cl \
71 sincos.cl \71 rsqrt.cl \
72 sinh.cl \72 sin.cl \
73 sinpi.cl \73 sincos.cl \
74 sqrt.cl \74 sinh.cl \
75 tan.cl \75 sinpi.cl \
76 tanh.cl \76 sqrt.cl \
77 tanpi.cl \77 tan.cl \
78 tgamma.cl \78 tanh.cl \
79 trunc.cl \79 tanpi.cl \
80 divide.cl \80 tgamma.cl \
81 recip.cl \81 trunc.cl \
82 abs.cl \82 divide.cl \
83 abs_diff.cl \83 recip.cl \
84 add_sat.cl \84 abs.cl \
85 hadd.cl \85 abs_diff.cl \
86 rhadd.cl \86 add_sat.cl \
87 clamp.cl \87 hadd.cl \
88 clz.cl \88 rhadd.cl \
89 mad_hi.cl \89 clamp.cl \
90 mad_sat.cl \90 clz.cl \
91 max.cl \91 mad_hi.cl \
92 min.cl \92 mad_sat.cl \
93 mul_hi.cl \93 max.cl \
94 rotate.cl \94 min.cl \
95 sub_sat.cl \95 mul_hi.cl \
96 upsample.cl \96 rotate.cl \
97 popcount.cl \97 sub_sat.cl \
98 mad24.cl \98 upsample.cl \
99 mul24.cl \99 popcount.cl \
100 degrees.cl \100 mad24.cl \
101 mix.cl \101 mul24.cl \
102 radians.cl \102 degrees.cl \
103 step.cl \103 mix.cl \
104 smoothstep.cl \104 radians.cl \
105 sign.cl \105 step.cl \
106 cross.cl \106 smoothstep.cl \
107 dot.cl \107 sign.cl \
108 distance.cl \108 cross.cl \
109 length.cl \109 dot.cl \
110 normalize.cl \110 distance.cl \
111 fast_distance.cl \111 length.cl \
112 fast_length.cl \112 normalize.cl \
113 fast_normalize.cl \113 fast_distance.cl \
114 isequal.cl \114 fast_length.cl \
115 isnotequal.cl \115 fast_normalize.cl \
116 isgreater.cl \116 isequal.cl \
117 isgreaterequal.cl \117 isnotequal.cl \
118 isless.cl \118 isgreater.cl \
119 islessequal.cl \119 isgreaterequal.cl \
120 islessgreater.cl \120 isless.cl \
121 isfinite.cl \121 islessequal.cl \
122 isinf.cl \122 islessgreater.cl \
123 isnan.cl \123 isfinite.cl \
124 isnormal.cl \124 isinf.cl \
125 isordered.cl \125 isnan.cl \
126 isunordered.cl \126 isnormal.cl \
127 signbit.cl \127 isordered.cl \
128 any.cl \128 isunordered.cl \
129 all.cl \129 signbit.cl \
130 bitselect.cl \130 any.cl \
131 select.cl \131 all.cl \
132 vload.cl \132 bitselect.cl \
133 vstore.cl \133 select.cl \
134 vload_half.cl \134 vload.cl \
135 vstore_half.cl \135 vstore.cl \
136 async_work_group_copy.cl \136 vload_half.cl \
137 wait_group_events.cl \137 vstore_half.cl \
138 read_image.cl \138 async_work_group_copy.cl \
139 write_image.cl \139 wait_group_events.cl \
140 get_image_width.cl \140 read_image.cl \
141 get_image_height.cl 141 write_image.cl \
142 get_image_width.cl \
143 get_image_height.cl
142144
143barrier.o: barrier.ll145barrier.o: barrier.ll
144 $(LLVM_AS) -o $@ $<146 $(LLVM_AS) -o $@ $<
145147
=== modified file 'lib/kernel/sub_sat.cl'
--- lib/kernel/sub_sat.cl 2011-10-26 19:49:23 +0000
+++ lib/kernel/sub_sat.cl 2012-08-15 20:36:22 +0000
@@ -30,8 +30,6 @@
30// ushort __builtin_ia32_psubusw12830// ushort __builtin_ia32_psubusw128
31// Other types don't seem to be supported.31// Other types don't seem to be supported.
3232
33// This could do with some testing
34// This could probably also be optimised (i.e. the ?: operators eliminated)
35DEFINE_EXPR_G_GG(sub_sat,33DEFINE_EXPR_G_GG(sub_sat,
36 (sgtype)-1 < (sgtype)0 ?34 (sgtype)-1 < (sgtype)0 ?
37 /* signed */35 /* signed */
3836
=== modified file 'lib/llvmopencl/Makefile.am'
--- lib/llvmopencl/Makefile.am 2012-06-15 12:26:40 +0000
+++ lib/llvmopencl/Makefile.am 2012-08-15 20:36:22 +0000
@@ -40,4 +40,4 @@
40 ImplicitLoopBarriers.h ImplicitLoopBarriers.cc \40 ImplicitLoopBarriers.h ImplicitLoopBarriers.cc \
41 WorkItemAliasAnalysis.cc WIVectorize.cc41 WorkItemAliasAnalysis.cc WIVectorize.cc
4242
43#llvmopencl_la_LIBADD = @LIBS_LLVMTRANSFORMUTILS@
44\ No newline at end of file43\ No newline at end of file
44#llvmopencl_la_LIBADD = @LIBS_LLVMTRANSFORMUTILS@
4545
=== modified file 'tests/regression/test_constant_array.cpp'
--- tests/regression/test_constant_array.cpp 2012-08-02 14:18:02 +0000
+++ tests/regression/test_constant_array.cpp 2012-08-15 20:36:22 +0000
@@ -29,6 +29,7 @@
29#include <cstdio>29#include <cstdio>
30#include <cstdlib>30#include <cstdlib>
31#include <iostream>31#include <iostream>
32#include <unistd.h>
3233
33#define WORK_ITEMS 134#define WORK_ITEMS 1
3435
3536
=== modified file 'tests/regression/test_infinite_loop.cpp'
--- tests/regression/test_infinite_loop.cpp 2012-08-02 14:18:02 +0000
+++ tests/regression/test_infinite_loop.cpp 2012-08-15 20:36:22 +0000
@@ -28,6 +28,7 @@
28#include <cstdio>28#include <cstdio>
29#include <cstdlib>29#include <cstdlib>
30#include <iostream>30#include <iostream>
31#include <unistd.h>
3132
32#define WORK_ITEMS 133#define WORK_ITEMS 1
3334