Merge lp:~schnetter/pocl/pocl into lp:~pocl/pocl/trunk
- pocl
- Merge into trunk
Status: | Merged |
---|---|
Merged at revision: | 335 |
Proposed branch: | lp:~schnetter/pocl/pocl |
Merge into: | lp:~pocl/pocl/trunk |
Diff against target: |
1371 lines (+573/-231) 27 files modified
INSTALL (+5/-1) include/_kernel.h (+35/-0) lib/CL/clEnqueueCopyBuffer.c (+18/-2) lib/CL/clEnqueueCopyBufferRect.c (+44/-1) lib/CL/clEnqueueNDRangeKernel.c (+2/-0) lib/CL/clEnqueueReadBuffer.c (+15/-5) lib/CL/clEnqueueReadBufferRect.c (+52/-12) lib/CL/clEnqueueUnmapMemObject.c (+1/-1) lib/CL/clEnqueueWriteBuffer.c (+12/-17) lib/CL/clEnqueueWriteBufferRect.c (+32/-20) lib/CL/clFinish.c (+6/-1) lib/CL/clGetDeviceInfo.c (+21/-21) lib/CL/clGetKernelWorkGroupInfo.c (+1/-1) lib/CL/clReleaseProgram.c (+1/-1) lib/CL/devices/basic/basic.h (+10/-0) lib/CL/devices/common.h (+1/-0) lib/CL/devices/devices.c (+2/-2) lib/CL/devices/pthread/pocl-pthread.h (+10/-0) lib/CL/pocl_cl.h (+13/-1) lib/kernel/atomics.cl (+149/-0) lib/kernel/hadd.cl (+0/-2) lib/kernel/rhadd.cl (+0/-2) lib/kernel/sources.mk (+140/-138) lib/kernel/sub_sat.cl (+0/-2) lib/llvmopencl/Makefile.am (+1/-1) tests/regression/test_constant_array.cpp (+1/-0) tests/regression/test_infinite_loop.cpp (+1/-0) |
To merge this branch: | bzr merge lp:~schnetter/pocl/pocl |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
pocl maintaners | Pending | ||
Review via email:
|
Commit message
Description of the change
I corrected several build errors (and some cosmetic issues) for Mac OS X.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Pekka Jääskeläinen (pekka-jaaskelainen) wrote : | # |
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Erik Schnetter (schnetter) wrote : | # |
I'm still relatively unfamiliar with bzr. How exactly would I "commit directly to 0.6"? I assume I check out the 0.6 branch first, and then what? Would "bzr merge -i" do the trick that allows me to select the fixes?
Should I commit my cosmetic changes as well, or should I commit those to the trunk only?
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Pekka Jääskeläinen (pekka-jaaskelainen) wrote : | # |
Yes, check out 0.6 (e.g. 'bzr co lp:pocl/0.6 pocl-0.6') then do a cherry pick merge from your branch (bzr merge -c revno ../yourbranchlo
IMO code base cleanups (cosmetic) are OK to commit to 0.6 as long as we do not introduce instability/
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Erik Schnetter (schnetter) wrote : | # |
I cherry-picked the changes into a local checkout of the 0.6 branch.
However, it appears as if I only received the changes, not the
commits? I now have a large number of "locally modified" files... Is
this to be expected? How should I handle this?
-erik
On Mon, Aug 13, 2012 at 11:33 PM, Pekka Jääskeläinen
<email address hidden> wrote:
> Yes, check out 0.6 (e.g. 'bzr co lp:pocl/0.6 pocl-0.6') then do a cherry pick merge from your branch (bzr merge -c revno ../yourbranchlo
>
> IMO code base cleanups (cosmetic) are OK to commit to 0.6 as long as we do not introduce instability/
> --
> https:/
> You are the owner of lp:~schnetter/pocl/pocl.
--
Erik Schnetter <email address hidden>
http://
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Pekka Jääskeläinen (pekka-jaaskelainen) wrote : | # |
That's normal. The "cherry picking" of Bazaar is not perfect in this regard. http://
- 334. By Erik Schnetter
-
Correct several build errors for Mac OS X
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Erik Schnetter (schnetter) wrote : | # |
I committed my changes (except adding the atomics) to the 0.6 branch, then merged the branch into the trunk.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Pekka Jääskeläinen (pekka-jaaskelainen) wrote : | # |
I can't see them in 0.6. Yep, better put the atomics to trunk, let's try to keep 0.6 in feature freeze.
- 335. By Erik Schnetter
-
Implement atomics
Preview Diff
1 | === modified file 'INSTALL' | |||
2 | --- INSTALL 2012-08-08 12:45:17 +0000 | |||
3 | +++ INSTALL 2012-08-15 20:36:22 +0000 | |||
4 | @@ -36,10 +36,14 @@ | |||
5 | 36 | ./configure --enable-shared --prefix=YOUR_INSTALLATION_PREFIX_HERE | 36 | ./configure --enable-shared --prefix=YOUR_INSTALLATION_PREFIX_HERE |
6 | 37 | make REQUIRES_RTTI=1 && make install | 37 | make REQUIRES_RTTI=1 && make install |
7 | 38 | 38 | ||
8 | 39 | |||
9 | 40 | !!!NOTE: A 64-bit host is at the moment required. See | 39 | !!!NOTE: A 64-bit host is at the moment required. See |
10 | 41 | https://bugs.launchpad.net/pocl/+bug/911911 | 40 | https://bugs.launchpad.net/pocl/+bug/911911 |
11 | 42 | 41 | ||
12 | 42 | !!!NOTE: On Mac OS X, you cannot build LLVM with LLVM; you need to | ||
13 | 43 | build using gcc. Use e.g. this command to configure instead: | ||
14 | 44 | |||
15 | 45 | ./configure --enable-shared CC=gcc CXX=g++ --prefix=YOUR_INSTALLATION_PREFIX_HERE | ||
16 | 46 | |||
17 | 43 | After all the requirements are installed. The installation procedure | 47 | After all the requirements are installed. The installation procedure |
18 | 44 | follows the usual autotools build+install. If you are using a development | 48 | follows the usual autotools build+install. If you are using a development |
19 | 45 | source tree, you need to generate the autotool build files with | 49 | source tree, you need to generate the autotool build files with |
20 | 46 | 50 | ||
21 | === modified file 'include/_kernel.h' | |||
22 | --- include/_kernel.h 2012-05-25 16:05:51 +0000 | |||
23 | +++ include/_kernel.h 2012-08-15 20:36:22 +0000 | |||
24 | @@ -1608,6 +1608,41 @@ | |||
25 | 1608 | #endif | 1608 | #endif |
26 | 1609 | 1609 | ||
27 | 1610 | 1610 | ||
28 | 1611 | 1611 | ||
29 | 1612 | /* Atomic operations */ | ||
30 | 1613 | |||
31 | 1614 | #define _CL_DECLARE_ATOMICS(MOD, TYPE) \ | ||
32 | 1615 | _cl_overloadable TYPE atomic_add (volatile MOD TYPE *p, TYPE val); \ | ||
33 | 1616 | _cl_overloadable TYPE atomic_sub (volatile MOD TYPE *p, TYPE val); \ | ||
34 | 1617 | _cl_overloadable TYPE atomic_xchg (volatile MOD TYPE *p, TYPE val); \ | ||
35 | 1618 | _cl_overloadable TYPE atomic_inc (volatile MOD TYPE *p); \ | ||
36 | 1619 | _cl_overloadable TYPE atomic_dec (volatile MOD TYPE *p); \ | ||
37 | 1620 | _cl_overloadable TYPE atomic_cmpxchg(volatile MOD TYPE *p, TYPE cmp, TYPE val); \ | ||
38 | 1621 | _cl_overloadable TYPE atomic_min (volatile MOD TYPE *p, TYPE val); \ | ||
39 | 1622 | _cl_overloadable TYPE atomic_max (volatile MOD TYPE *p, TYPE val); \ | ||
40 | 1623 | _cl_overloadable TYPE atomic_and (volatile MOD TYPE *p, TYPE val); \ | ||
41 | 1624 | _cl_overloadable TYPE atomic_or (volatile MOD TYPE *p, TYPE val); \ | ||
42 | 1625 | _cl_overloadable TYPE atomic_xor (volatile MOD TYPE *p, TYPE val); | ||
43 | 1626 | _CL_DECLARE_ATOMICS(__global, int ) | ||
44 | 1627 | _CL_DECLARE_ATOMICS(__global, uint) | ||
45 | 1628 | _CL_DECLARE_ATOMICS(__local , int ) | ||
46 | 1629 | _CL_DECLARE_ATOMICS(__local , uint) | ||
47 | 1630 | |||
48 | 1631 | _cl_overloadable float atomic_xchg(volatile __global float *p, float val); | ||
49 | 1632 | _cl_overloadable float atomic_xchg(volatile __local float *p, float val); | ||
50 | 1633 | |||
51 | 1634 | #define atom_add atomic_add | ||
52 | 1635 | #define atom_sub atomic_sub | ||
53 | 1636 | #define atom_xchg atomic_xchg | ||
54 | 1637 | #define atom_inc atomic_inc | ||
55 | 1638 | #define atom_dec atomic_dec | ||
56 | 1639 | #define atom_cmpxchg atomic_cmpxchg | ||
57 | 1640 | #define atom_min atomic_min | ||
58 | 1641 | #define atom_max atomic_max | ||
59 | 1642 | #define atom_and atomic_and | ||
60 | 1643 | #define atom_or atomic_or | ||
61 | 1644 | #define atom_xor atomic_xor | ||
62 | 1645 | |||
63 | 1646 | |||
64 | 1612 | 1647 | ||
65 | 1613 | /* Miscellaneous Vector Functions */ | 1648 | /* Miscellaneous Vector Functions */ |
66 | 1614 | 1649 | ||
67 | 1615 | // This code leads to an ICE in Clang | 1650 | // This code leads to an ICE in Clang |
68 | 1616 | 1651 | ||
69 | === modified file 'lib/CL/clEnqueueCopyBuffer.c' | |||
70 | --- lib/CL/clEnqueueCopyBuffer.c 2012-05-14 11:45:48 +0000 | |||
71 | +++ lib/CL/clEnqueueCopyBuffer.c 2012-08-15 20:36:22 +0000 | |||
72 | @@ -23,6 +23,7 @@ | |||
73 | 23 | */ | 23 | */ |
74 | 24 | 24 | ||
75 | 25 | #include "pocl_cl.h" | 25 | #include "pocl_cl.h" |
76 | 26 | #include "pocl_icd.h" | ||
77 | 26 | #include "utlist.h" | 27 | #include "utlist.h" |
78 | 27 | #include <assert.h> | 28 | #include <assert.h> |
79 | 28 | 29 | ||
80 | @@ -55,14 +56,28 @@ | |||
81 | 55 | return CL_INVALID_VALUE; | 56 | return CL_INVALID_VALUE; |
82 | 56 | 57 | ||
83 | 57 | device_id = command_queue->device; | 58 | device_id = command_queue->device; |
84 | 59 | |||
85 | 58 | for (i = 0; i < command_queue->context->num_devices; ++i) | 60 | for (i = 0; i < command_queue->context->num_devices; ++i) |
86 | 59 | { | 61 | { |
87 | 60 | if (command_queue->context->devices[i] == device_id) | 62 | if (command_queue->context->devices[i] == device_id) |
88 | 61 | break; | 63 | break; |
89 | 62 | } | 64 | } |
90 | 63 | |||
91 | 64 | assert(i < command_queue->context->num_devices); | 65 | assert(i < command_queue->context->num_devices); |
92 | 65 | 66 | ||
93 | 67 | if (event != NULL) | ||
94 | 68 | { | ||
95 | 69 | *event = (cl_event)malloc(sizeof(struct _cl_event)); | ||
96 | 70 | if (*event == NULL) | ||
97 | 71 | return CL_OUT_OF_HOST_MEMORY; | ||
98 | 72 | POCL_INIT_OBJECT(*event); | ||
99 | 73 | (*event)->queue = command_queue; | ||
100 | 74 | POCL_INIT_ICD_OBJECT(*event); | ||
101 | 75 | clRetainCommandQueue (command_queue); | ||
102 | 76 | |||
103 | 77 | POCL_PROFILE_QUEUED; | ||
104 | 78 | } | ||
105 | 79 | |||
106 | 80 | |||
107 | 66 | _cl_command_node * cmd = malloc(sizeof(_cl_command_node)); | 81 | _cl_command_node * cmd = malloc(sizeof(_cl_command_node)); |
108 | 67 | if (cmd == NULL) | 82 | if (cmd == NULL) |
109 | 68 | return CL_OUT_OF_HOST_MEMORY; | 83 | return CL_OUT_OF_HOST_MEMORY; |
110 | @@ -81,8 +96,9 @@ | |||
111 | 81 | cmd->command.copy.dst_ptr = dst_buffer->device_ptrs[device_id->dev_id] + dst_offset; | 96 | cmd->command.copy.dst_ptr = dst_buffer->device_ptrs[device_id->dev_id] + dst_offset; |
112 | 82 | cmd->command.copy.cb = cb; | 97 | cmd->command.copy.cb = cb; |
113 | 83 | cmd->next = NULL; | 98 | cmd->next = NULL; |
114 | 99 | cmd->event = event ? *event : NULL; | ||
115 | 84 | 100 | ||
117 | 85 | LL_APPEND(command_queue->root, cmd ); | 101 | LL_APPEND(command_queue->root, cmd); |
118 | 86 | 102 | ||
119 | 87 | return CL_SUCCESS; | 103 | return CL_SUCCESS; |
120 | 88 | } | 104 | } |
121 | 89 | 105 | ||
122 | === modified file 'lib/CL/clEnqueueCopyBufferRect.c' | |||
123 | --- lib/CL/clEnqueueCopyBufferRect.c 2012-05-14 11:45:48 +0000 | |||
124 | +++ lib/CL/clEnqueueCopyBufferRect.c 2012-08-15 20:36:22 +0000 | |||
125 | @@ -22,6 +22,7 @@ | |||
126 | 22 | */ | 22 | */ |
127 | 23 | 23 | ||
128 | 24 | #include "pocl_cl.h" | 24 | #include "pocl_cl.h" |
129 | 25 | #include "pocl_icd.h" | ||
130 | 25 | #include <assert.h> | 26 | #include <assert.h> |
131 | 26 | 27 | ||
132 | 27 | CL_API_ENTRY cl_int CL_API_CALL | 28 | CL_API_ENTRY cl_int CL_API_CALL |
133 | @@ -69,14 +70,51 @@ | |||
134 | 69 | return CL_INVALID_VALUE; | 70 | return CL_INVALID_VALUE; |
135 | 70 | 71 | ||
136 | 71 | device_id = command_queue->device; | 72 | device_id = command_queue->device; |
137 | 73 | |||
138 | 72 | for (i = 0; i < command_queue->context->num_devices; ++i) | 74 | for (i = 0; i < command_queue->context->num_devices; ++i) |
139 | 73 | { | 75 | { |
140 | 74 | if (command_queue->context->devices[i] == device_id) | 76 | if (command_queue->context->devices[i] == device_id) |
141 | 75 | break; | 77 | break; |
142 | 76 | } | 78 | } |
143 | 77 | |||
144 | 78 | assert(i < command_queue->context->num_devices); | 79 | assert(i < command_queue->context->num_devices); |
145 | 79 | 80 | ||
146 | 81 | if (event != NULL) | ||
147 | 82 | { | ||
148 | 83 | *event = (cl_event)malloc(sizeof(struct _cl_event)); | ||
149 | 84 | if (*event == NULL) | ||
150 | 85 | return CL_OUT_OF_HOST_MEMORY; | ||
151 | 86 | POCL_INIT_OBJECT(*event); | ||
152 | 87 | (*event)->queue = command_queue; | ||
153 | 88 | POCL_INIT_ICD_OBJECT(*event); | ||
154 | 89 | |||
155 | 90 | clRetainCommandQueue (command_queue); | ||
156 | 91 | |||
157 | 92 | POCL_PROFILE_QUEUED; | ||
158 | 93 | } | ||
159 | 94 | |||
160 | 95 | |||
161 | 96 | /* execute directly */ | ||
162 | 97 | /* TODO: enqueue the read_rect if this is a non-blocking read (see | ||
163 | 98 | clEnqueueReadBuffer) */ | ||
164 | 99 | if (command_queue->properties & CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE) | ||
165 | 100 | { | ||
166 | 101 | /* wait for the event in event_wait_list to finish */ | ||
167 | 102 | POCL_ABORT_UNIMPLEMENTED(); | ||
168 | 103 | } | ||
169 | 104 | else | ||
170 | 105 | { | ||
171 | 106 | /* in-order queue - all previously enqueued commands must | ||
172 | 107 | * finish before this read */ | ||
173 | 108 | // ensure our buffer is not freed yet | ||
174 | 109 | clRetainMemObject (src_buffer); | ||
175 | 110 | clRetainMemObject (dst_buffer); | ||
176 | 111 | clFinish(command_queue); | ||
177 | 112 | } | ||
178 | 113 | POCL_PROFILE_SUBMITTED; | ||
179 | 114 | POCL_PROFILE_RUNNING; | ||
180 | 115 | |||
181 | 116 | /* TODO: offset computation doesn't work in case the ptr is not | ||
182 | 117 | a direct pointer */ | ||
183 | 80 | device_id->copy_rect(device_id->data, | 118 | device_id->copy_rect(device_id->data, |
184 | 81 | src_buffer->device_ptrs[device_id->dev_id], | 119 | src_buffer->device_ptrs[device_id->dev_id], |
185 | 82 | dst_buffer->device_ptrs[device_id->dev_id], | 120 | dst_buffer->device_ptrs[device_id->dev_id], |
186 | @@ -84,5 +122,10 @@ | |||
187 | 84 | src_row_pitch, src_slice_pitch, | 122 | src_row_pitch, src_slice_pitch, |
188 | 85 | dst_row_pitch, dst_slice_pitch); | 123 | dst_row_pitch, dst_slice_pitch); |
189 | 86 | 124 | ||
190 | 125 | POCL_PROFILE_COMPLETE; | ||
191 | 126 | |||
192 | 127 | clReleaseMemObject (src_buffer); | ||
193 | 128 | clReleaseMemObject (dst_buffer); | ||
194 | 129 | |||
195 | 87 | return CL_SUCCESS; | 130 | return CL_SUCCESS; |
196 | 88 | } | 131 | } |
197 | 89 | 132 | ||
198 | === modified file 'lib/CL/clEnqueueNDRangeKernel.c' | |||
199 | --- lib/CL/clEnqueueNDRangeKernel.c 2012-05-30 14:10:44 +0000 | |||
200 | +++ lib/CL/clEnqueueNDRangeKernel.c 2012-08-15 20:36:22 +0000 | |||
201 | @@ -314,6 +314,8 @@ | |||
202 | 314 | } | 314 | } |
203 | 315 | } | 315 | } |
204 | 316 | 316 | ||
205 | 317 | command_node->event = event ? *event : NULL; | ||
206 | 318 | |||
207 | 317 | LL_APPEND(command_queue->root, command_node); | 319 | LL_APPEND(command_queue->root, command_node); |
208 | 318 | 320 | ||
209 | 319 | return CL_SUCCESS; | 321 | return CL_SUCCESS; |
210 | 320 | 322 | ||
211 | === modified file 'lib/CL/clEnqueueReadBuffer.c' | |||
212 | --- lib/CL/clEnqueueReadBuffer.c 2012-05-29 12:16:56 +0000 | |||
213 | +++ lib/CL/clEnqueueReadBuffer.c 2012-08-15 20:36:22 +0000 | |||
214 | @@ -38,6 +38,7 @@ | |||
215 | 38 | cl_event *event) CL_API_SUFFIX__VERSION_1_0 | 38 | cl_event *event) CL_API_SUFFIX__VERSION_1_0 |
216 | 39 | { | 39 | { |
217 | 40 | cl_device_id device; | 40 | cl_device_id device; |
218 | 41 | unsigned i; | ||
219 | 41 | 42 | ||
220 | 42 | if (command_queue == NULL) | 43 | if (command_queue == NULL) |
221 | 43 | return CL_INVALID_COMMAND_QUEUE; | 44 | return CL_INVALID_COMMAND_QUEUE; |
222 | @@ -54,6 +55,13 @@ | |||
223 | 54 | 55 | ||
224 | 55 | device = command_queue->device; | 56 | device = command_queue->device; |
225 | 56 | 57 | ||
226 | 58 | for (i = 0; i < command_queue->context->num_devices; ++i) | ||
227 | 59 | { | ||
228 | 60 | if (command_queue->context->devices[i] == device) | ||
229 | 61 | break; | ||
230 | 62 | } | ||
231 | 63 | assert(i < command_queue->context->num_devices); | ||
232 | 64 | |||
233 | 57 | if (event != NULL) | 65 | if (event != NULL) |
234 | 58 | { | 66 | { |
235 | 59 | *event = (cl_event)malloc(sizeof(struct _cl_event)); | 67 | *event = (cl_event)malloc(sizeof(struct _cl_event)); |
236 | @@ -62,14 +70,15 @@ | |||
237 | 62 | POCL_INIT_OBJECT(*event); | 70 | POCL_INIT_OBJECT(*event); |
238 | 63 | (*event)->queue = command_queue; | 71 | (*event)->queue = command_queue; |
239 | 64 | POCL_INIT_ICD_OBJECT(*event); | 72 | POCL_INIT_ICD_OBJECT(*event); |
240 | 65 | |||
241 | 66 | clRetainCommandQueue (command_queue); | 73 | clRetainCommandQueue (command_queue); |
242 | 67 | 74 | ||
244 | 68 | POCL_PROFILE_QUEUED; | 75 | POCL_PROFILE_QUEUED; |
245 | 69 | } | 76 | } |
246 | 70 | 77 | ||
247 | 71 | 78 | ||
248 | 72 | /* enqueue the read, or execute directly */ | 79 | /* enqueue the read, or execute directly */ |
249 | 80 | /* TODO: why do we implement both? direct execution seems | ||
250 | 81 | unnecessary. */ | ||
251 | 73 | if (blocking_read) | 82 | if (blocking_read) |
252 | 74 | { | 83 | { |
253 | 75 | if (command_queue->properties & CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE) | 84 | if (command_queue->properties & CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE) |
254 | @@ -101,16 +110,17 @@ | |||
255 | 101 | _cl_command_node * cmd = malloc(sizeof(_cl_command_node)); | 110 | _cl_command_node * cmd = malloc(sizeof(_cl_command_node)); |
256 | 102 | if (cmd == NULL) | 111 | if (cmd == NULL) |
257 | 103 | return CL_OUT_OF_HOST_MEMORY; | 112 | return CL_OUT_OF_HOST_MEMORY; |
259 | 104 | 113 | ||
260 | 105 | cmd->type = CL_COMMAND_TYPE_READ; | 114 | cmd->type = CL_COMMAND_TYPE_READ; |
261 | 106 | cmd->command.read.data = device->data; | 115 | cmd->command.read.data = device->data; |
262 | 107 | cmd->command.read.host_ptr = ptr; | 116 | cmd->command.read.host_ptr = ptr; |
263 | 108 | cmd->command.read.device_ptr = buffer->device_ptrs[device->dev_id]+offset; | 117 | cmd->command.read.device_ptr = buffer->device_ptrs[device->dev_id]+offset; |
264 | 109 | cmd->command.read.cb = cb; | 118 | cmd->command.read.cb = cb; |
265 | 119 | cmd->command.read.buffer = buffer; | ||
266 | 110 | cmd->next = NULL; | 120 | cmd->next = NULL; |
268 | 111 | cmd->command.read.buffer = buffer; | 121 | cmd->event = event ? *event : NULL; |
269 | 112 | clRetainMemObject (buffer); | 122 | clRetainMemObject (buffer); |
271 | 113 | LL_APPEND(command_queue->root, cmd ); | 123 | LL_APPEND(command_queue->root, cmd); |
272 | 114 | } | 124 | } |
273 | 115 | 125 | ||
274 | 116 | return CL_SUCCESS; | 126 | return CL_SUCCESS; |
275 | 117 | 127 | ||
276 | === modified file 'lib/CL/clEnqueueReadBufferRect.c' | |||
277 | --- lib/CL/clEnqueueReadBufferRect.c 2012-05-25 16:05:51 +0000 | |||
278 | +++ lib/CL/clEnqueueReadBufferRect.c 2012-08-15 20:36:22 +0000 | |||
279 | @@ -22,6 +22,7 @@ | |||
280 | 22 | */ | 22 | */ |
281 | 23 | 23 | ||
282 | 24 | #include "pocl_cl.h" | 24 | #include "pocl_cl.h" |
283 | 25 | #include "pocl_icd.h" | ||
284 | 25 | #include <assert.h> | 26 | #include <assert.h> |
285 | 26 | #include <stdio.h> | 27 | #include <stdio.h> |
286 | 27 | 28 | ||
287 | @@ -41,7 +42,7 @@ | |||
288 | 41 | const cl_event *event_wait_list, | 42 | const cl_event *event_wait_list, |
289 | 42 | cl_event *event) CL_API_SUFFIX__VERSION_1_1 | 43 | cl_event *event) CL_API_SUFFIX__VERSION_1_1 |
290 | 43 | { | 44 | { |
292 | 44 | cl_device_id device_id; | 45 | cl_device_id device; |
293 | 45 | unsigned i; | 46 | unsigned i; |
294 | 46 | 47 | ||
295 | 47 | if (command_queue == NULL) | 48 | if (command_queue == NULL) |
296 | @@ -58,28 +59,67 @@ | |||
297 | 58 | (host_origin == NULL) || | 59 | (host_origin == NULL) || |
298 | 59 | (region == NULL)) | 60 | (region == NULL)) |
299 | 60 | return CL_INVALID_VALUE; | 61 | return CL_INVALID_VALUE; |
301 | 61 | 62 | ||
302 | 62 | if ((region[0]*region[1]*region[2] > 0) && | 63 | if ((region[0]*region[1]*region[2] > 0) && |
303 | 63 | (buffer_origin[0] + region[0]-1 + | 64 | (buffer_origin[0] + region[0]-1 + |
304 | 64 | buffer_row_pitch * (buffer_origin[1] + region[1]-1) + | 65 | buffer_row_pitch * (buffer_origin[1] + region[1]-1) + |
305 | 65 | buffer_slice_pitch * (buffer_origin[2] + region[2]-1) >= buffer->size)) | 66 | buffer_slice_pitch * (buffer_origin[2] + region[2]-1) >= buffer->size)) |
306 | 66 | return CL_INVALID_VALUE; | 67 | return CL_INVALID_VALUE; |
307 | 67 | 68 | ||
309 | 68 | device_id = command_queue->device; | 69 | device = command_queue->device; |
310 | 70 | |||
311 | 69 | for (i = 0; i < command_queue->context->num_devices; ++i) | 71 | for (i = 0; i < command_queue->context->num_devices; ++i) |
312 | 70 | { | 72 | { |
315 | 71 | if (command_queue->context->devices[i] == device_id) | 73 | if (command_queue->context->devices[i] == device) |
316 | 72 | break; | 74 | break; |
317 | 73 | } | 75 | } |
318 | 74 | |||
319 | 75 | assert(i < command_queue->context->num_devices); | 76 | assert(i < command_queue->context->num_devices); |
320 | 76 | 77 | ||
327 | 77 | device_id->read_rect(device_id->data, ptr, | 78 | if (event != NULL) |
328 | 78 | buffer->device_ptrs[device_id->dev_id], | 79 | { |
329 | 79 | buffer_origin, host_origin, region, | 80 | *event = (cl_event)malloc(sizeof(struct _cl_event)); |
330 | 80 | buffer_row_pitch, buffer_slice_pitch, | 81 | if (*event == NULL) |
331 | 81 | host_row_pitch, host_slice_pitch); | 82 | return CL_OUT_OF_HOST_MEMORY; |
332 | 82 | 83 | POCL_INIT_OBJECT(*event); | |
333 | 84 | (*event)->queue = command_queue; | ||
334 | 85 | POCL_INIT_ICD_OBJECT(*event); | ||
335 | 86 | |||
336 | 87 | clRetainCommandQueue (command_queue); | ||
337 | 88 | |||
338 | 89 | POCL_PROFILE_QUEUED; | ||
339 | 90 | } | ||
340 | 91 | |||
341 | 92 | |||
342 | 93 | /* execute directly */ | ||
343 | 94 | /* TODO: enqueue the read_rect if this is a non-blocking read (see | ||
344 | 95 | clEnqueueReadBuffer) */ | ||
345 | 96 | if (command_queue->properties & CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE) | ||
346 | 97 | { | ||
347 | 98 | /* wait for the event in event_wait_list to finish */ | ||
348 | 99 | POCL_ABORT_UNIMPLEMENTED(); | ||
349 | 100 | } | ||
350 | 101 | else | ||
351 | 102 | { | ||
352 | 103 | /* in-order queue - all previously enqueued commands must | ||
353 | 104 | * finish before this read */ | ||
354 | 105 | // ensure our buffer is not freed yet | ||
355 | 106 | clRetainMemObject (buffer); | ||
356 | 107 | clFinish(command_queue); | ||
357 | 108 | } | ||
358 | 109 | POCL_PROFILE_SUBMITTED; | ||
359 | 110 | POCL_PROFILE_RUNNING; | ||
360 | 111 | |||
361 | 112 | /* TODO: offset computation doesn't work in case the ptr is not | ||
362 | 113 | a direct pointer */ | ||
363 | 114 | device->read_rect(device->data, ptr, | ||
364 | 115 | buffer->device_ptrs[device->dev_id], | ||
365 | 116 | buffer_origin, host_origin, region, | ||
366 | 117 | buffer_row_pitch, buffer_slice_pitch, | ||
367 | 118 | host_row_pitch, host_slice_pitch); | ||
368 | 119 | |||
369 | 120 | POCL_PROFILE_COMPLETE; | ||
370 | 121 | |||
371 | 122 | clReleaseMemObject (buffer); | ||
372 | 83 | 123 | ||
373 | 84 | return CL_SUCCESS; | 124 | return CL_SUCCESS; |
374 | 85 | } | 125 | } |
375 | 86 | 126 | ||
376 | === modified file 'lib/CL/clEnqueueUnmapMemObject.c' | |||
377 | --- lib/CL/clEnqueueUnmapMemObject.c 2012-05-29 12:16:56 +0000 | |||
378 | +++ lib/CL/clEnqueueUnmapMemObject.c 2012-08-15 20:36:22 +0000 | |||
379 | @@ -104,6 +104,6 @@ | |||
380 | 104 | DL_DELETE(memobj->mappings, mapping); | 104 | DL_DELETE(memobj->mappings, mapping); |
381 | 105 | memobj->map_count--; | 105 | memobj->map_count--; |
382 | 106 | clReleaseMemObject (memobj); | 106 | clReleaseMemObject (memobj); |
384 | 107 | event = NULL; | 107 | |
385 | 108 | return CL_SUCCESS; | 108 | return CL_SUCCESS; |
386 | 109 | } | 109 | } |
387 | 110 | 110 | ||
388 | === modified file 'lib/CL/clEnqueueWriteBuffer.c' | |||
389 | --- lib/CL/clEnqueueWriteBuffer.c 2012-05-29 12:16:56 +0000 | |||
390 | +++ lib/CL/clEnqueueWriteBuffer.c 2012-08-15 20:36:22 +0000 | |||
391 | @@ -37,7 +37,7 @@ | |||
392 | 37 | const cl_event *event_wait_list, | 37 | const cl_event *event_wait_list, |
393 | 38 | cl_event *event) CL_API_SUFFIX__VERSION_1_0 | 38 | cl_event *event) CL_API_SUFFIX__VERSION_1_0 |
394 | 39 | { | 39 | { |
396 | 40 | cl_device_id device_id; | 40 | cl_device_id device; |
397 | 41 | unsigned i; | 41 | unsigned i; |
398 | 42 | 42 | ||
399 | 43 | if (command_queue == NULL) | 43 | if (command_queue == NULL) |
400 | @@ -53,13 +53,13 @@ | |||
401 | 53 | (offset + cb > buffer->size)) | 53 | (offset + cb > buffer->size)) |
402 | 54 | return CL_INVALID_VALUE; | 54 | return CL_INVALID_VALUE; |
403 | 55 | 55 | ||
405 | 56 | device_id = command_queue->device; | 56 | device = command_queue->device; |
406 | 57 | |||
407 | 57 | for (i = 0; i < command_queue->context->num_devices; ++i) | 58 | for (i = 0; i < command_queue->context->num_devices; ++i) |
408 | 58 | { | 59 | { |
411 | 59 | if (command_queue->context->devices[i] == device_id) | 60 | if (command_queue->context->devices[i] == device) |
412 | 60 | break; | 61 | break; |
413 | 61 | } | 62 | } |
414 | 62 | |||
415 | 63 | assert(i < command_queue->context->num_devices); | 63 | assert(i < command_queue->context->num_devices); |
416 | 64 | 64 | ||
417 | 65 | if (event != NULL) | 65 | if (event != NULL) |
418 | @@ -76,6 +76,8 @@ | |||
419 | 76 | } | 76 | } |
420 | 77 | 77 | ||
421 | 78 | /* enqueue the write, or execute directly */ | 78 | /* enqueue the write, or execute directly */ |
422 | 79 | /* TODO: why do we implement both? direct execution seems | ||
423 | 80 | unnecessary. */ | ||
424 | 79 | if (blocking_write) | 81 | if (blocking_write) |
425 | 80 | { | 82 | { |
426 | 81 | if (command_queue->properties & CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE) | 83 | if (command_queue->properties & CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE) |
427 | @@ -94,17 +96,9 @@ | |||
428 | 94 | POCL_PROFILE_SUBMITTED; | 96 | POCL_PROFILE_SUBMITTED; |
429 | 95 | POCL_PROFILE_RUNNING; | 97 | POCL_PROFILE_RUNNING; |
430 | 96 | /* TODO: fixme. The offset computation must be done at the device driver. */ | 98 | /* TODO: fixme. The offset computation must be done at the device driver. */ |
432 | 97 | device_id->write(device_id->data, ptr, buffer->device_ptrs[device_id->dev_id]+offset, cb); | 99 | device->write(device->data, ptr, buffer->device_ptrs[device->dev_id]+offset, cb); |
433 | 98 | POCL_PROFILE_COMPLETE; | 100 | POCL_PROFILE_COMPLETE; |
434 | 99 | 101 | ||
435 | 100 | if (command_queue->properties & CL_QUEUE_PROFILING_ENABLE && | ||
436 | 101 | event != NULL) | ||
437 | 102 | { | ||
438 | 103 | (*event)->status = CL_COMPLETE; | ||
439 | 104 | (*event)->time_end = | ||
440 | 105 | command_queue->device->get_timer_value(command_queue->device->data); | ||
441 | 106 | } | ||
442 | 107 | |||
443 | 108 | clReleaseMemObject (buffer); | 102 | clReleaseMemObject (buffer); |
444 | 109 | } | 103 | } |
445 | 110 | else | 104 | else |
446 | @@ -112,14 +106,15 @@ | |||
447 | 112 | _cl_command_node * cmd = malloc(sizeof(_cl_command_node)); | 106 | _cl_command_node * cmd = malloc(sizeof(_cl_command_node)); |
448 | 113 | if (cmd == NULL) | 107 | if (cmd == NULL) |
449 | 114 | return CL_OUT_OF_HOST_MEMORY; | 108 | return CL_OUT_OF_HOST_MEMORY; |
451 | 115 | 109 | ||
452 | 116 | cmd->type = CL_COMMAND_TYPE_WRITE; | 110 | cmd->type = CL_COMMAND_TYPE_WRITE; |
454 | 117 | cmd->command.write.data = device_id->data; | 111 | cmd->command.write.data = device->data; |
455 | 118 | cmd->command.write.host_ptr = ptr; | 112 | cmd->command.write.host_ptr = ptr; |
456 | 119 | cmd->command.write.device_ptr = buffer->device_ptrs[i]+offset; | 113 | cmd->command.write.device_ptr = buffer->device_ptrs[i]+offset; |
457 | 120 | cmd->command.write.cb = cb; | 114 | cmd->command.write.cb = cb; |
458 | 115 | cmd->command.write.buffer = buffer; | ||
459 | 121 | cmd->next = NULL; | 116 | cmd->next = NULL; |
461 | 122 | cmd->command.write.buffer = buffer; | 117 | cmd->event = event ? *event : NULL; |
462 | 123 | clRetainMemObject (buffer); | 118 | clRetainMemObject (buffer); |
463 | 124 | 119 | ||
464 | 125 | LL_APPEND(command_queue->root, cmd); | 120 | LL_APPEND(command_queue->root, cmd); |
465 | 126 | 121 | ||
466 | === modified file 'lib/CL/clEnqueueWriteBufferRect.c' | |||
467 | --- lib/CL/clEnqueueWriteBufferRect.c 2012-05-14 11:45:48 +0000 | |||
468 | +++ lib/CL/clEnqueueWriteBufferRect.c 2012-08-15 20:36:22 +0000 | |||
469 | @@ -40,7 +40,7 @@ | |||
470 | 40 | const cl_event *event_wait_list, | 40 | const cl_event *event_wait_list, |
471 | 41 | cl_event *event) CL_API_SUFFIX__VERSION_1_1 | 41 | cl_event *event) CL_API_SUFFIX__VERSION_1_1 |
472 | 42 | { | 42 | { |
474 | 43 | cl_device_id device_id; | 43 | cl_device_id device; |
475 | 44 | unsigned i; | 44 | unsigned i; |
476 | 45 | 45 | ||
477 | 46 | if (command_queue == NULL) | 46 | if (command_queue == NULL) |
478 | @@ -57,43 +57,55 @@ | |||
479 | 57 | (host_origin == NULL) || | 57 | (host_origin == NULL) || |
480 | 58 | (region == NULL)) | 58 | (region == NULL)) |
481 | 59 | return CL_INVALID_VALUE; | 59 | return CL_INVALID_VALUE; |
483 | 60 | 60 | ||
484 | 61 | if ((region[0]*region[1]*region[2] > 0) && | 61 | if ((region[0]*region[1]*region[2] > 0) && |
485 | 62 | (buffer_origin[0] + region[0]-1 + | 62 | (buffer_origin[0] + region[0]-1 + |
486 | 63 | buffer_row_pitch * (buffer_origin[1] + region[1]-1) + | 63 | buffer_row_pitch * (buffer_origin[1] + region[1]-1) + |
487 | 64 | buffer_slice_pitch * (buffer_origin[2] + region[2]-1) >= buffer->size)) | 64 | buffer_slice_pitch * (buffer_origin[2] + region[2]-1) >= buffer->size)) |
488 | 65 | { | 65 | { |
489 | 66 | POCL_ABORT_UNIMPLEMENTED(); | 66 | POCL_ABORT_UNIMPLEMENTED(); |
490 | 67 | #if 0 | ||
491 | 68 | printf("bo=[%d,%d,%d]\n" | ||
492 | 69 | "ho=[%d,%d,%d]\n" | ||
493 | 70 | "re=[%d,%d,%d]\n" | ||
494 | 71 | "bp=[,%d,%d]\n" | ||
495 | 72 | "hp=[,%d,%d]\n" | ||
496 | 73 | "bs=[%d]\n", | ||
497 | 74 | (int)buffer_origin[0], (int)buffer_origin[1], (int)buffer_origin[2], | ||
498 | 75 | (int)host_origin[0], (int)host_origin[1], (int)host_origin[2], | ||
499 | 76 | (int)region[0], (int)region[1], (int)region[2], | ||
500 | 77 | (int)buffer_row_pitch, (int)buffer_slice_pitch, | ||
501 | 78 | (int)host_row_pitch, (int)host_slice_pitch, | ||
502 | 79 | (int)buffer->size); | ||
503 | 80 | #endif | ||
504 | 81 | return CL_INVALID_VALUE; | 67 | return CL_INVALID_VALUE; |
505 | 82 | } | 68 | } |
506 | 83 | 69 | ||
508 | 84 | device_id = command_queue->device; | 70 | device = command_queue->device; |
509 | 71 | |||
510 | 85 | for (i = 0; i < command_queue->context->num_devices; ++i) | 72 | for (i = 0; i < command_queue->context->num_devices; ++i) |
511 | 86 | { | 73 | { |
513 | 87 | if (command_queue->context->devices[i] == device_id) | 74 | if (command_queue->context->devices[i] == device) |
514 | 88 | break; | 75 | break; |
515 | 89 | } | 76 | } |
516 | 90 | |||
517 | 91 | assert(i < command_queue->context->num_devices); | 77 | assert(i < command_queue->context->num_devices); |
518 | 92 | 78 | ||
520 | 93 | device_id->write_rect(device_id->data, ptr, buffer->device_ptrs[device_id->dev_id], | 79 | |
521 | 80 | /* execute directly */ | ||
522 | 81 | /* TODO: enqueue the write_rect if this is a non-blocking read (see | ||
523 | 82 | clEnqueueWriteBuffer) */ | ||
524 | 83 | if (command_queue->properties & CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE) | ||
525 | 84 | { | ||
526 | 85 | /* wait for the event in event_wait_list to finish */ | ||
527 | 86 | POCL_ABORT_UNIMPLEMENTED(); | ||
528 | 87 | } | ||
529 | 88 | else | ||
530 | 89 | { | ||
531 | 90 | /* in-order queue - all previously enqueued commands must | ||
532 | 91 | * finish before this read */ | ||
533 | 92 | // ensure our buffer is not freed yet | ||
534 | 93 | clRetainMemObject (buffer); | ||
535 | 94 | clFinish(command_queue); | ||
536 | 95 | } | ||
537 | 96 | POCL_PROFILE_SUBMITTED; | ||
538 | 97 | POCL_PROFILE_RUNNING; | ||
539 | 98 | |||
540 | 99 | /* TODO: offset computation doesn't work in case the ptr is not | ||
541 | 100 | a direct pointer */ | ||
542 | 101 | device->write_rect(device->data, ptr, buffer->device_ptrs[device->dev_id], | ||
543 | 94 | buffer_origin, host_origin, region, | 102 | buffer_origin, host_origin, region, |
544 | 95 | buffer_row_pitch, buffer_slice_pitch, | 103 | buffer_row_pitch, buffer_slice_pitch, |
545 | 96 | host_row_pitch, host_slice_pitch); | 104 | host_row_pitch, host_slice_pitch); |
546 | 97 | 105 | ||
547 | 106 | POCL_PROFILE_COMPLETE; | ||
548 | 107 | |||
549 | 108 | clReleaseMemObject (buffer); | ||
550 | 109 | |||
551 | 98 | return CL_SUCCESS; | 110 | return CL_SUCCESS; |
552 | 99 | } | 111 | } |
553 | 100 | 112 | ||
554 | === modified file 'lib/CL/clFinish.c' | |||
555 | --- lib/CL/clFinish.c 2012-05-31 12:01:27 +0000 | |||
556 | +++ lib/CL/clFinish.c 2012-08-15 20:36:22 +0000 | |||
557 | @@ -41,6 +41,7 @@ | |||
558 | 41 | { | 41 | { |
559 | 42 | case CL_COMMAND_TYPE_READ: | 42 | case CL_COMMAND_TYPE_READ: |
560 | 43 | POCL_PROFILE_SUBMITTED; | 43 | POCL_PROFILE_SUBMITTED; |
561 | 44 | POCL_PROFILE_RUNNING; | ||
562 | 44 | command_queue->device->read | 45 | command_queue->device->read |
563 | 45 | (node->command.read.data, | 46 | (node->command.read.data, |
564 | 46 | node->command.read.host_ptr, | 47 | node->command.read.host_ptr, |
565 | @@ -51,6 +52,7 @@ | |||
566 | 51 | break; | 52 | break; |
567 | 52 | case CL_COMMAND_TYPE_WRITE: | 53 | case CL_COMMAND_TYPE_WRITE: |
568 | 53 | POCL_PROFILE_SUBMITTED; | 54 | POCL_PROFILE_SUBMITTED; |
569 | 55 | POCL_PROFILE_RUNNING; | ||
570 | 54 | command_queue->device->write | 56 | command_queue->device->write |
571 | 55 | (node->command.write.data, | 57 | (node->command.write.data, |
572 | 56 | node->command.write.host_ptr, | 58 | node->command.write.host_ptr, |
573 | @@ -61,6 +63,7 @@ | |||
574 | 61 | break; | 63 | break; |
575 | 62 | case CL_COMMAND_TYPE_COPY: | 64 | case CL_COMMAND_TYPE_COPY: |
576 | 63 | POCL_PROFILE_SUBMITTED; | 65 | POCL_PROFILE_SUBMITTED; |
577 | 66 | POCL_PROFILE_RUNNING; | ||
578 | 64 | command_queue->device->copy | 67 | command_queue->device->copy |
579 | 65 | (node->command.copy.data, | 68 | (node->command.copy.data, |
580 | 66 | node->command.copy.src_ptr, | 69 | node->command.copy.src_ptr, |
581 | @@ -71,9 +74,11 @@ | |||
582 | 71 | clReleaseMemObject (node->command.copy.dst_buffer); | 74 | clReleaseMemObject (node->command.copy.dst_buffer); |
583 | 72 | break; | 75 | break; |
584 | 73 | case CL_COMMAND_TYPE_RUN: | 76 | case CL_COMMAND_TYPE_RUN: |
585 | 74 | POCL_PROFILE_SUBMITTED; | ||
586 | 75 | assert (*event == node->event); | 77 | assert (*event == node->event); |
587 | 78 | POCL_PROFILE_SUBMITTED; | ||
588 | 79 | POCL_PROFILE_RUNNING; | ||
589 | 76 | command_queue->device->run(node->command.run.data, node); | 80 | command_queue->device->run(node->command.run.data, node); |
590 | 81 | POCL_PROFILE_COMPLETE; | ||
591 | 77 | for (i = 0; i < node->command.run.arg_buffer_count; ++i) | 82 | for (i = 0; i < node->command.run.arg_buffer_count; ++i) |
592 | 78 | { | 83 | { |
593 | 79 | cl_mem buf = node->command.run.arg_buffers[i]; | 84 | cl_mem buf = node->command.run.arg_buffers[i]; |
594 | 80 | 85 | ||
595 | === modified file 'lib/CL/clGetDeviceInfo.c' | |||
596 | --- lib/CL/clGetDeviceInfo.c 2012-08-08 12:45:17 +0000 | |||
597 | +++ lib/CL/clGetDeviceInfo.c 2012-08-15 20:36:22 +0000 | |||
598 | @@ -126,23 +126,17 @@ | |||
599 | 126 | } | 126 | } |
600 | 127 | 127 | ||
601 | 128 | case CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR: | 128 | case CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR: |
604 | 129 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_char); | 129 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_char); |
603 | 130 | |||
605 | 131 | case CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT: | 130 | case CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT: |
606 | 132 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_short); | 131 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_short); |
607 | 133 | |||
608 | 134 | case CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT: | 132 | case CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT: |
609 | 135 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_int); | 133 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_int); |
610 | 136 | |||
611 | 137 | case CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG: | 134 | case CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG: |
612 | 138 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_long); | 135 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_long); |
613 | 139 | |||
614 | 140 | case CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT: | 136 | case CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT: |
615 | 141 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_float); | 137 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_float); |
616 | 142 | |||
617 | 143 | case CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE: | 138 | case CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE: |
618 | 144 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_double); | 139 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_double); |
619 | 145 | |||
620 | 146 | case CL_DEVICE_MAX_CLOCK_FREQUENCY : | 140 | case CL_DEVICE_MAX_CLOCK_FREQUENCY : |
621 | 147 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->max_clock_frequency); | 141 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->max_clock_frequency); |
622 | 148 | case CL_DEVICE_ADDRESS_BITS : | 142 | case CL_DEVICE_ADDRESS_BITS : |
623 | @@ -186,7 +180,7 @@ | |||
624 | 186 | case CL_DEVICE_MAX_CONSTANT_ARGS : | 180 | case CL_DEVICE_MAX_CONSTANT_ARGS : |
625 | 187 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->max_constant_args); | 181 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->max_constant_args); |
626 | 188 | case CL_DEVICE_LOCAL_MEM_TYPE : | 182 | case CL_DEVICE_LOCAL_MEM_TYPE : |
628 | 189 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->local_mem_size); | 183 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->local_mem_type); |
629 | 190 | case CL_DEVICE_LOCAL_MEM_SIZE: | 184 | case CL_DEVICE_LOCAL_MEM_SIZE: |
630 | 191 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_ulong, device->local_mem_size); | 185 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_ulong, device->local_mem_size); |
631 | 192 | case CL_DEVICE_ERROR_CORRECTION_SUPPORT : | 186 | case CL_DEVICE_ERROR_CORRECTION_SUPPORT : |
632 | @@ -218,28 +212,34 @@ | |||
633 | 218 | 212 | ||
634 | 219 | case CL_DEVICE_EXTENSIONS : | 213 | case CL_DEVICE_EXTENSIONS : |
635 | 220 | POCL_RETURN_DEVICE_INFO_STR("cl_khr_fp16"); | 214 | POCL_RETURN_DEVICE_INFO_STR("cl_khr_fp16"); |
638 | 221 | case CL_DEVICE_PLATFORM : break; | 215 | case CL_DEVICE_PLATFORM : |
639 | 222 | case CL_DEVICE_DOUBLE_FP_CONFIG : break; | 216 | { |
640 | 217 | /* Return the first platform id, assuming this is the only | ||
641 | 218 | platform id (which is currently always the case for pocl) */ | ||
642 | 219 | cl_platform_id platform_id; | ||
643 | 220 | clGetPlatformIDs(1, &platform_id, NULL); | ||
644 | 221 | POCL_RETURN_DEVICE_INFO(cl_platform_id, platform_id); | ||
645 | 222 | } | ||
646 | 223 | case CL_DEVICE_DOUBLE_FP_CONFIG : | ||
647 | 224 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_ulong, device->double_fp_config); | ||
648 | 223 | case CL_DEVICE_HALF_FP_CONFIG : break; | 225 | case CL_DEVICE_HALF_FP_CONFIG : break; |
650 | 224 | case CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF : break; | 226 | case CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF : |
651 | 227 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_half); | ||
652 | 225 | case CL_DEVICE_HOST_UNIFIED_MEMORY : break; | 228 | case CL_DEVICE_HOST_UNIFIED_MEMORY : break; |
653 | 226 | /* TODO: figure out what the difference between preferred and native | ||
654 | 227 | widths are. And why there is no struct fields 'native_vector...' */ | ||
655 | 228 | case CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR : | 229 | case CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR : |
657 | 229 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_char); | 230 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->native_vector_width_char); |
658 | 230 | case CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT : | 231 | case CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT : |
660 | 231 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_short); | 232 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->native_vector_width_short); |
661 | 232 | case CL_DEVICE_NATIVE_VECTOR_WIDTH_INT : | 233 | case CL_DEVICE_NATIVE_VECTOR_WIDTH_INT : |
663 | 233 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_int); | 234 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->native_vector_width_int); |
664 | 234 | case CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG : | 235 | case CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG : |
666 | 235 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_long); | 236 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->native_vector_width_long); |
667 | 236 | case CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT : | 237 | case CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT : |
669 | 237 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_float); | 238 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->native_vector_width_float); |
670 | 238 | case CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE : | 239 | case CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE : |
672 | 239 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->preferred_vector_width_double); | 240 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->native_vector_width_double); |
673 | 240 | case CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF : | 241 | case CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF : |
676 | 241 | /* TODO: why is there no preferred_vector_width_half? */ | 242 | POCL_RETURN_DEVICE_INFO_WITH_IMPL_CHECK(cl_uint, device->native_vector_width_half); |
675 | 242 | POCL_RETURN_DEVICE_INFO(cl_uint, 0); | ||
677 | 243 | case CL_DEVICE_OPENCL_C_VERSION : | 243 | case CL_DEVICE_OPENCL_C_VERSION : |
678 | 244 | POCL_RETURN_DEVICE_INFO_STR("1.2"); | 244 | POCL_RETURN_DEVICE_INFO_STR("1.2"); |
679 | 245 | } | 245 | } |
680 | 246 | 246 | ||
681 | === modified file 'lib/CL/clGetKernelWorkGroupInfo.c' | |||
682 | --- lib/CL/clGetKernelWorkGroupInfo.c 2012-03-28 11:25:45 +0000 | |||
683 | +++ lib/CL/clGetKernelWorkGroupInfo.c 2012-08-15 20:36:22 +0000 | |||
684 | @@ -65,7 +65,7 @@ | |||
685 | 65 | POCL_ABORT_UNIMPLEMENTED(); | 65 | POCL_ABORT_UNIMPLEMENTED(); |
686 | 66 | 66 | ||
687 | 67 | case CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE: | 67 | case CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE: |
689 | 68 | POCL_RETURN_KERNEL_WG_INFO(size_t, device->preferred_wg_size_multiple); | 68 | POCL_RETURN_KERNEL_WG_INFO(size_t, device->preferred_wg_size_multiple); |
690 | 69 | 69 | ||
691 | 70 | case CL_KERNEL_LOCAL_MEM_SIZE: | 70 | case CL_KERNEL_LOCAL_MEM_SIZE: |
692 | 71 | POCL_RETURN_KERNEL_WG_INFO(cl_ulong, device->local_mem_size); | 71 | POCL_RETURN_KERNEL_WG_INFO(cl_ulong, device->local_mem_size); |
693 | 72 | 72 | ||
694 | === modified file 'lib/CL/clReleaseProgram.c' | |||
695 | --- lib/CL/clReleaseProgram.c 2012-05-31 12:01:27 +0000 | |||
696 | +++ lib/CL/clReleaseProgram.c 2012-08-15 20:36:22 +0000 | |||
697 | @@ -58,7 +58,7 @@ | |||
698 | 58 | free (program->binary_sizes); | 58 | free (program->binary_sizes); |
699 | 59 | 59 | ||
700 | 60 | env = getenv ("POCL_LEAVE_TEMP_DIRS"); | 60 | env = getenv ("POCL_LEAVE_TEMP_DIRS"); |
702 | 61 | if (!(env != NULL && strlen (env) == 1 && env[0] == '1') && | 61 | if (!(env != NULL && strcmp (env, "1") == 0) && |
703 | 62 | getenv("POCL_TEMP_DIR") == NULL) | 62 | getenv("POCL_TEMP_DIR") == NULL) |
704 | 63 | { | 63 | { |
705 | 64 | remove_directory (program->temp_dir); | 64 | remove_directory (program->temp_dir); |
706 | 65 | 65 | ||
707 | === modified file 'lib/CL/devices/basic/basic.h' | |||
708 | --- lib/CL/devices/basic/basic.h 2012-08-08 14:54:32 +0000 | |||
709 | +++ lib/CL/devices/basic/basic.h 2012-08-15 20:36:22 +0000 | |||
710 | @@ -46,6 +46,15 @@ | |||
711 | 46 | POCL_DEVICES_PREFERRED_VECTOR_WIDTH_LONG , /* preferred_vector_width_long */ \ | 46 | POCL_DEVICES_PREFERRED_VECTOR_WIDTH_LONG , /* preferred_vector_width_long */ \ |
712 | 47 | POCL_DEVICES_PREFERRED_VECTOR_WIDTH_FLOAT , /* preferred_vector_width_float */ \ | 47 | POCL_DEVICES_PREFERRED_VECTOR_WIDTH_FLOAT , /* preferred_vector_width_float */ \ |
713 | 48 | POCL_DEVICES_PREFERRED_VECTOR_WIDTH_DOUBLE, /* preferred_vector_width_double */ \ | 48 | POCL_DEVICES_PREFERRED_VECTOR_WIDTH_DOUBLE, /* preferred_vector_width_double */ \ |
714 | 49 | POCL_DEVICES_PREFERRED_VECTOR_WIDTH_HALF , /* preferred_vector_width_half */ \ | ||
715 | 50 | /* TODO: figure out what the difference between preferred and native widths are. */ \ | ||
716 | 51 | POCL_DEVICES_PREFERRED_VECTOR_WIDTH_CHAR , /* preferred_vector_width_char */ \ | ||
717 | 52 | POCL_DEVICES_PREFERRED_VECTOR_WIDTH_SHORT , /* preferred_vector_width_short */ \ | ||
718 | 53 | POCL_DEVICES_PREFERRED_VECTOR_WIDTH_INT , /* preferred_vector_width_int */ \ | ||
719 | 54 | POCL_DEVICES_PREFERRED_VECTOR_WIDTH_LONG , /* preferred_vector_width_long */ \ | ||
720 | 55 | POCL_DEVICES_PREFERRED_VECTOR_WIDTH_FLOAT , /* preferred_vector_width_float */ \ | ||
721 | 56 | POCL_DEVICES_PREFERRED_VECTOR_WIDTH_DOUBLE, /* preferred_vector_width_double */ \ | ||
722 | 57 | POCL_DEVICES_PREFERRED_VECTOR_WIDTH_HALF , /* preferred_vector_width_half */ \ | ||
723 | 49 | 0, /* max_clock_frequency */ \ | 58 | 0, /* max_clock_frequency */ \ |
724 | 50 | 0, /* address_bits */ \ | 59 | 0, /* address_bits */ \ |
725 | 51 | 0, /* max_mem_alloc_size */ \ | 60 | 0, /* max_mem_alloc_size */ \ |
726 | @@ -62,6 +71,7 @@ | |||
727 | 62 | 0, /* mem_base_addr_align */ \ | 71 | 0, /* mem_base_addr_align */ \ |
728 | 63 | 0, /* min_data_type_align_size */ \ | 72 | 0, /* min_data_type_align_size */ \ |
729 | 64 | CL_FP_ROUND_TO_NEAREST | CL_FP_INF_NAN, /* single_fp_config */ \ | 73 | CL_FP_ROUND_TO_NEAREST | CL_FP_INF_NAN, /* single_fp_config */ \ |
730 | 74 | CL_FP_ROUND_TO_NEAREST | CL_FP_INF_NAN, /* double_fp_config */ \ | ||
731 | 65 | CL_NONE, /* global_mem_cache_type */ \ | 75 | CL_NONE, /* global_mem_cache_type */ \ |
732 | 66 | 0, /* global_mem_cacheline_size */ \ | 76 | 0, /* global_mem_cacheline_size */ \ |
733 | 67 | 0, /* global_mem_cache_size */ \ | 77 | 0, /* global_mem_cache_size */ \ |
734 | 68 | 78 | ||
735 | === modified file 'lib/CL/devices/common.h' | |||
736 | --- lib/CL/devices/common.h 2012-04-24 13:12:25 +0000 | |||
737 | +++ lib/CL/devices/common.h 2012-08-15 20:36:22 +0000 | |||
738 | @@ -70,5 +70,6 @@ | |||
739 | 70 | #endif | 70 | #endif |
740 | 71 | /* Half is internally represented as short */ | 71 | /* Half is internally represented as short */ |
741 | 72 | #define POCL_DEVICES_PREFERRED_VECTOR_WIDTH_HALF POCL_DEVICES_PREFERRED_VECTOR_WIDTH_SHORT | 72 | #define POCL_DEVICES_PREFERRED_VECTOR_WIDTH_HALF POCL_DEVICES_PREFERRED_VECTOR_WIDTH_SHORT |
742 | 73 | #define POCL_DEVICES_NATIVE_VECTOR_WIDTH_HALF POCL_DEVICES_NATIVE_VECTOR_WIDTH_SHORT | ||
743 | 73 | 74 | ||
744 | 74 | #endif | 75 | #endif |
745 | 75 | 76 | ||
746 | === modified file 'lib/CL/devices/devices.c' | |||
747 | --- lib/CL/devices/devices.c 2012-05-31 12:01:27 +0000 | |||
748 | +++ lib/CL/devices/devices.c 2012-08-15 20:36:22 +0000 | |||
749 | @@ -57,7 +57,7 @@ | |||
750 | 57 | pocl_init_devices() | 57 | pocl_init_devices() |
751 | 58 | { | 58 | { |
752 | 59 | const char *device_list; | 59 | const char *device_list; |
754 | 60 | char *ptr, *tofree, *token, *saveptr, *saveptr2; | 60 | char *ptr, *tofree, *token, *saveptr; |
755 | 61 | int i, devcount; | 61 | int i, devcount; |
756 | 62 | if (pocl_num_devices > 0) | 62 | if (pocl_num_devices > 0) |
757 | 63 | return; | 63 | return; |
758 | @@ -83,7 +83,7 @@ | |||
759 | 83 | 83 | ||
760 | 84 | ptr = tofree = strdup(device_list); | 84 | ptr = tofree = strdup(device_list); |
761 | 85 | devcount = 0; | 85 | devcount = 0; |
763 | 86 | while ((token = strtok_r (ptr, " ", &saveptr2)) != NULL) | 86 | while ((token = strtok_r (ptr, " ", &saveptr)) != NULL) |
764 | 87 | { | 87 | { |
765 | 88 | struct _cl_device_id* device_type = NULL; | 88 | struct _cl_device_id* device_type = NULL; |
766 | 89 | 89 | ||
767 | 90 | 90 | ||
768 | === modified file 'lib/CL/devices/pthread/pocl-pthread.h' | |||
769 | --- lib/CL/devices/pthread/pocl-pthread.h 2012-08-08 14:54:32 +0000 | |||
770 | +++ lib/CL/devices/pthread/pocl-pthread.h 2012-08-15 20:36:22 +0000 | |||
771 | @@ -50,6 +50,15 @@ | |||
772 | 50 | POCL_DEVICES_PREFERRED_VECTOR_WIDTH_LONG , /* preferred_vector_width_long */ \ | 50 | POCL_DEVICES_PREFERRED_VECTOR_WIDTH_LONG , /* preferred_vector_width_long */ \ |
773 | 51 | POCL_DEVICES_PREFERRED_VECTOR_WIDTH_FLOAT , /* preferred_vector_width_float */ \ | 51 | POCL_DEVICES_PREFERRED_VECTOR_WIDTH_FLOAT , /* preferred_vector_width_float */ \ |
774 | 52 | POCL_DEVICES_PREFERRED_VECTOR_WIDTH_DOUBLE, /* preferred_vector_width_double */ \ | 52 | POCL_DEVICES_PREFERRED_VECTOR_WIDTH_DOUBLE, /* preferred_vector_width_double */ \ |
775 | 53 | POCL_DEVICES_PREFERRED_VECTOR_WIDTH_HALF, /* preferred_vector_width_half */ \ | ||
776 | 54 | /* TODO: figure out what the difference between preferred and native widths are. */ \ | ||
777 | 55 | POCL_DEVICES_PREFERRED_VECTOR_WIDTH_CHAR , /* preferred_vector_width_char */ \ | ||
778 | 56 | POCL_DEVICES_PREFERRED_VECTOR_WIDTH_SHORT , /* preferred_vector_width_short */ \ | ||
779 | 57 | POCL_DEVICES_PREFERRED_VECTOR_WIDTH_INT , /* preferred_vector_width_int */ \ | ||
780 | 58 | POCL_DEVICES_PREFERRED_VECTOR_WIDTH_LONG , /* preferred_vector_width_long */ \ | ||
781 | 59 | POCL_DEVICES_PREFERRED_VECTOR_WIDTH_FLOAT , /* preferred_vector_width_float */ \ | ||
782 | 60 | POCL_DEVICES_PREFERRED_VECTOR_WIDTH_DOUBLE, /* preferred_vector_width_double */ \ | ||
783 | 61 | POCL_DEVICES_PREFERRED_VECTOR_WIDTH_HALF , /* preferred_vector_width_half */ \ | ||
784 | 53 | 0, /* max_clock_frequency */ \ | 62 | 0, /* max_clock_frequency */ \ |
785 | 54 | 0, /* address_bits */ \ | 63 | 0, /* address_bits */ \ |
786 | 55 | 0, /* max_mem_alloc_size */ \ | 64 | 0, /* max_mem_alloc_size */ \ |
787 | @@ -66,6 +75,7 @@ | |||
788 | 66 | 0, /* mem_base_addr_align */ \ | 75 | 0, /* mem_base_addr_align */ \ |
789 | 67 | 0, /* min_data_type_align_size */ \ | 76 | 0, /* min_data_type_align_size */ \ |
790 | 68 | CL_FP_ROUND_TO_NEAREST | CL_FP_INF_NAN, /* single_fp_config */ \ | 77 | CL_FP_ROUND_TO_NEAREST | CL_FP_INF_NAN, /* single_fp_config */ \ |
791 | 78 | CL_FP_ROUND_TO_NEAREST | CL_FP_INF_NAN, /* double_fp_config */ \ | ||
792 | 69 | CL_NONE, /* global_mem_cache_type */ \ | 79 | CL_NONE, /* global_mem_cache_type */ \ |
793 | 70 | 0, /* global_mem_cacheline_size */ \ | 80 | 0, /* global_mem_cacheline_size */ \ |
794 | 71 | 0, /* global_mem_cache_size */ \ | 81 | 0, /* global_mem_cache_size */ \ |
795 | 72 | 82 | ||
796 | === modified file 'lib/CL/pocl_cl.h' | |||
797 | --- lib/CL/pocl_cl.h 2012-08-08 14:54:32 +0000 | |||
798 | +++ lib/CL/pocl_cl.h 2012-08-15 20:36:22 +0000 | |||
799 | @@ -25,6 +25,7 @@ | |||
800 | 25 | #define POCL_CL_H | 25 | #define POCL_CL_H |
801 | 26 | 26 | ||
802 | 27 | #include "config.h" | 27 | #include "config.h" |
803 | 28 | #include <assert.h> | ||
804 | 28 | #include <stdio.h> | 29 | #include <stdio.h> |
805 | 29 | #include <ltdl.h> | 30 | #include <ltdl.h> |
806 | 30 | #include <pthread.h> | 31 | #include <pthread.h> |
807 | @@ -150,6 +151,14 @@ | |||
808 | 150 | cl_uint preferred_vector_width_long; | 151 | cl_uint preferred_vector_width_long; |
809 | 151 | cl_uint preferred_vector_width_float; | 152 | cl_uint preferred_vector_width_float; |
810 | 152 | cl_uint preferred_vector_width_double; | 153 | cl_uint preferred_vector_width_double; |
811 | 154 | cl_uint preferred_vector_width_half; | ||
812 | 155 | cl_uint native_vector_width_char; | ||
813 | 156 | cl_uint native_vector_width_short; | ||
814 | 157 | cl_uint native_vector_width_int; | ||
815 | 158 | cl_uint native_vector_width_long; | ||
816 | 159 | cl_uint native_vector_width_float; | ||
817 | 160 | cl_uint native_vector_width_double; | ||
818 | 161 | cl_uint native_vector_width_half; | ||
819 | 153 | cl_uint max_clock_frequency; | 162 | cl_uint max_clock_frequency; |
820 | 154 | cl_uint address_bits; | 163 | cl_uint address_bits; |
821 | 155 | cl_ulong max_mem_alloc_size; | 164 | cl_ulong max_mem_alloc_size; |
822 | @@ -166,6 +175,7 @@ | |||
823 | 166 | cl_uint mem_base_addr_align; | 175 | cl_uint mem_base_addr_align; |
824 | 167 | cl_uint min_data_type_align_size; | 176 | cl_uint min_data_type_align_size; |
825 | 168 | cl_device_fp_config single_fp_config; | 177 | cl_device_fp_config single_fp_config; |
826 | 178 | cl_device_fp_config double_fp_config; | ||
827 | 169 | cl_device_mem_cache_type global_mem_cache_type; | 179 | cl_device_mem_cache_type global_mem_cache_type; |
828 | 170 | cl_uint global_mem_cacheline_size; | 180 | cl_uint global_mem_cacheline_size; |
829 | 171 | cl_ulong global_mem_cache_size; | 181 | cl_ulong global_mem_cache_size; |
830 | @@ -394,12 +404,12 @@ | |||
831 | 394 | } \ | 404 | } \ |
832 | 395 | } while (0) \ | 405 | } while (0) \ |
833 | 396 | 406 | ||
834 | 397 | |||
835 | 398 | #define POCL_PROFILE_SUBMITTED \ | 407 | #define POCL_PROFILE_SUBMITTED \ |
836 | 399 | do { \ | 408 | do { \ |
837 | 400 | if (command_queue->properties & CL_QUEUE_PROFILING_ENABLE && \ | 409 | if (command_queue->properties & CL_QUEUE_PROFILING_ENABLE && \ |
838 | 401 | event != NULL && (*event) != NULL) \ | 410 | event != NULL && (*event) != NULL) \ |
839 | 402 | { \ | 411 | { \ |
840 | 412 | assert((*event)->status = CL_QUEUED); \ | ||
841 | 403 | (*event)->status = CL_SUBMITTED; \ | 413 | (*event)->status = CL_SUBMITTED; \ |
842 | 404 | (*event)->time_submit = \ | 414 | (*event)->time_submit = \ |
843 | 405 | command_queue->device->get_timer_value(command_queue->device->data); \ | 415 | command_queue->device->get_timer_value(command_queue->device->data); \ |
844 | @@ -411,6 +421,7 @@ | |||
845 | 411 | if (command_queue->properties & CL_QUEUE_PROFILING_ENABLE && \ | 421 | if (command_queue->properties & CL_QUEUE_PROFILING_ENABLE && \ |
846 | 412 | event != NULL && (*event) != NULL) \ | 422 | event != NULL && (*event) != NULL) \ |
847 | 413 | { \ | 423 | { \ |
848 | 424 | assert((*event)->status = CL_SUBMITTED); \ | ||
849 | 414 | (*event)->status = CL_RUNNING; \ | 425 | (*event)->status = CL_RUNNING; \ |
850 | 415 | (*event)->time_start = \ | 426 | (*event)->time_start = \ |
851 | 416 | command_queue->device->get_timer_value(command_queue->device->data); \ | 427 | command_queue->device->get_timer_value(command_queue->device->data); \ |
852 | @@ -422,6 +433,7 @@ | |||
853 | 422 | if (command_queue->properties & CL_QUEUE_PROFILING_ENABLE && \ | 433 | if (command_queue->properties & CL_QUEUE_PROFILING_ENABLE && \ |
854 | 423 | event != NULL && (*event) != NULL) \ | 434 | event != NULL && (*event) != NULL) \ |
855 | 424 | { \ | 435 | { \ |
856 | 436 | assert((*event)->status = CL_RUNNING); \ | ||
857 | 425 | (*event)->status = CL_COMPLETE; \ | 437 | (*event)->status = CL_COMPLETE; \ |
858 | 426 | (*event)->time_end = \ | 438 | (*event)->time_end = \ |
859 | 427 | command_queue->device->get_timer_value(command_queue->device->data); \ | 439 | command_queue->device->get_timer_value(command_queue->device->data); \ |
860 | 428 | 440 | ||
861 | === added file 'lib/kernel/atomics.cl' | |||
862 | --- lib/kernel/atomics.cl 1970-01-01 00:00:00 +0000 | |||
863 | +++ lib/kernel/atomics.cl 2012-08-15 20:36:22 +0000 | |||
864 | @@ -0,0 +1,149 @@ | |||
865 | 1 | /* OpenCL built-in library: atomic operations | ||
866 | 2 | |||
867 | 3 | Copyright (c) 2012 Universidad Rey Juan Carlos | ||
868 | 4 | |||
869 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy | ||
870 | 6 | of this software and associated documentation files (the "Software"), to deal | ||
871 | 7 | in the Software without restriction, including without limitation the rights | ||
872 | 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
873 | 9 | copies of the Software, and to permit persons to whom the Software is | ||
874 | 10 | furnished to do so, subject to the following conditions: | ||
875 | 11 | |||
876 | 12 | The above copyright notice and this permission notice shall be included in | ||
877 | 13 | all copies or substantial portions of the Software. | ||
878 | 14 | |||
879 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
880 | 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
881 | 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
882 | 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
883 | 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
884 | 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN | ||
885 | 21 | THE SOFTWARE. | ||
886 | 22 | */ | ||
887 | 23 | |||
888 | 24 | |||
889 | 25 | |||
890 | 26 | // Repeat the content of this file several times with different values | ||
891 | 27 | // for Q, T, and U: | ||
892 | 28 | #if !defined(Q) | ||
893 | 29 | |||
894 | 30 | # define Q __global | ||
895 | 31 | # include "atomics.cl" | ||
896 | 32 | # undef Q | ||
897 | 33 | |||
898 | 34 | # define Q __local | ||
899 | 35 | # include "atomics.cl" | ||
900 | 36 | # undef Q | ||
901 | 37 | |||
902 | 38 | #elif !defined(T) | ||
903 | 39 | |||
904 | 40 | # define T int | ||
905 | 41 | # define MIN __sync_fetch_and_min | ||
906 | 42 | # define MAX __sync_fetch_and_max | ||
907 | 43 | # include "atomics.cl" | ||
908 | 44 | # undef T | ||
909 | 45 | # undef MIN | ||
910 | 46 | # undef MAX | ||
911 | 47 | |||
912 | 48 | # define T uint | ||
913 | 49 | # define MIN __sync_fetch_and_umin | ||
914 | 50 | # define MAX __sync_fetch_and_umax | ||
915 | 51 | # include "atomics.cl" | ||
916 | 52 | # undef T | ||
917 | 53 | # undef MIN | ||
918 | 54 | # undef MAX | ||
919 | 55 | |||
920 | 56 | |||
921 | 57 | // xchg is also supported for float as a special case | ||
922 | 58 | __attribute__((overloadable)) | ||
923 | 59 | float atomic_xchg(volatile Q float *p, float val) | ||
924 | 60 | { | ||
925 | 61 | // NOTE: We compare the float as int here... | ||
926 | 62 | return __atomic_exchange_n((volatile int*)p, val, __ATOMIC_RELAXED); | ||
927 | 63 | } | ||
928 | 64 | |||
929 | 65 | #else | ||
930 | 66 | |||
931 | 67 | |||
932 | 68 | |||
933 | 69 | // basic | ||
934 | 70 | |||
935 | 71 | // read, add, store | ||
936 | 72 | __attribute__((overloadable)) | ||
937 | 73 | T atomic_add(volatile Q T *p, T val) | ||
938 | 74 | { | ||
939 | 75 | return __sync_fetch_and_add((volatile T*)p, val, __ATOMIC_RELAXED); | ||
940 | 76 | } | ||
941 | 77 | |||
942 | 78 | // read, subtract, store | ||
943 | 79 | __attribute__((overloadable)) | ||
944 | 80 | T atomic_sub(volatile Q T *p, T val) | ||
945 | 81 | { | ||
946 | 82 | return __sync_fetch_and_sub(p, val, __ATOMIC_RELAXED); | ||
947 | 83 | } | ||
948 | 84 | |||
949 | 85 | // read, swap, store | ||
950 | 86 | __attribute__((overloadable)) | ||
951 | 87 | T atomic_xchg(volatile Q T *p, T val) | ||
952 | 88 | { | ||
953 | 89 | return __atomic_exchange_n(p, val, __ATOMIC_RELAXED); | ||
954 | 90 | } | ||
955 | 91 | |||
956 | 92 | // read, increment, store | ||
957 | 93 | __attribute__((overloadable)) | ||
958 | 94 | T atomic_inc(volatile Q T *p) | ||
959 | 95 | { | ||
960 | 96 | return atomic_add(p, (T)1); | ||
961 | 97 | } | ||
962 | 98 | |||
963 | 99 | // read, decrement, store | ||
964 | 100 | __attribute__((overloadable)) | ||
965 | 101 | T atomic_dec(volatile Q T *p) | ||
966 | 102 | { | ||
967 | 103 | return atomic_sub(p, (T)1); | ||
968 | 104 | } | ||
969 | 105 | |||
970 | 106 | // read, store | ||
971 | 107 | __attribute__((overloadable)) | ||
972 | 108 | T atomic_cmpxchg(volatile Q T *p, T cmp, T val) | ||
973 | 109 | { | ||
974 | 110 | __atomic_compare_exchange_n(p, &cmp, val, false, | ||
975 | 111 | __ATOMIC_RELAXED, __ATOMIC_RELAXED); | ||
976 | 112 | return cmp; | ||
977 | 113 | } | ||
978 | 114 | |||
979 | 115 | // extended | ||
980 | 116 | |||
981 | 117 | __attribute__((overloadable)) | ||
982 | 118 | T atomic_min(volatile Q T *p, T val) | ||
983 | 119 | { | ||
984 | 120 | return MIN((volatile T*)p, val); | ||
985 | 121 | } | ||
986 | 122 | |||
987 | 123 | __attribute__((overloadable)) | ||
988 | 124 | T atomic_max(volatile Q T *p, T val) | ||
989 | 125 | { | ||
990 | 126 | return MAX((volatile T*)p, val); | ||
991 | 127 | } | ||
992 | 128 | |||
993 | 129 | __attribute__((overloadable)) | ||
994 | 130 | T atomic_and(volatile Q T *p, T val) | ||
995 | 131 | { | ||
996 | 132 | return __sync_fetch_and_and(p, val, __ATOMIC_RELAXED); | ||
997 | 133 | } | ||
998 | 134 | |||
999 | 135 | __attribute__((overloadable)) | ||
1000 | 136 | T atomic_or(volatile Q T *p, T val) | ||
1001 | 137 | { | ||
1002 | 138 | return __sync_fetch_and_or(p, val, __ATOMIC_RELAXED); | ||
1003 | 139 | } | ||
1004 | 140 | |||
1005 | 141 | __attribute__((overloadable)) | ||
1006 | 142 | T atomic_xor(volatile Q T *p, T val) | ||
1007 | 143 | { | ||
1008 | 144 | return __sync_fetch_and_xor(p, val, __ATOMIC_RELAXED); | ||
1009 | 145 | } | ||
1010 | 146 | |||
1011 | 147 | |||
1012 | 148 | |||
1013 | 149 | #endif | ||
1014 | 0 | 150 | ||
1015 | === modified file 'lib/kernel/hadd.cl' | |||
1016 | --- lib/kernel/hadd.cl 2011-10-26 13:03:37 +0000 | |||
1017 | +++ lib/kernel/hadd.cl 2012-08-15 20:36:22 +0000 | |||
1018 | @@ -23,7 +23,5 @@ | |||
1019 | 23 | 23 | ||
1020 | 24 | #include "templates.h" | 24 | #include "templates.h" |
1021 | 25 | 25 | ||
1022 | 26 | // This could do with some testing | ||
1023 | 27 | // This could probably also be optimised (i.e. the ?: operators eliminated) | ||
1024 | 28 | DEFINE_EXPR_G_GG(hadd, | 26 | DEFINE_EXPR_G_GG(hadd, |
1025 | 29 | (a >> (sgtype)1) + (b >> (sgtype)1) + (a & b & (gtype)1)) | 27 | (a >> (sgtype)1) + (b >> (sgtype)1) + (a & b & (gtype)1)) |
1026 | 30 | 28 | ||
1027 | === modified file 'lib/kernel/rhadd.cl' | |||
1028 | --- lib/kernel/rhadd.cl 2011-10-26 13:03:37 +0000 | |||
1029 | +++ lib/kernel/rhadd.cl 2012-08-15 20:36:22 +0000 | |||
1030 | @@ -23,7 +23,5 @@ | |||
1031 | 23 | 23 | ||
1032 | 24 | #include "templates.h" | 24 | #include "templates.h" |
1033 | 25 | 25 | ||
1034 | 26 | // This could do with some testing | ||
1035 | 27 | // This could probably also be optimised (i.e. the ?: operators eliminated) | ||
1036 | 28 | DEFINE_EXPR_G_GG(rhadd, | 26 | DEFINE_EXPR_G_GG(rhadd, |
1037 | 29 | (a >> (sgtype)1) + (b >> (sgtype)1) + ((a | b) & (gtype)1)) | 27 | (a >> (sgtype)1) + (b >> (sgtype)1) + ((a | b) & (gtype)1)) |
1038 | 30 | 28 | ||
1039 | === modified file 'lib/kernel/sources.mk' | |||
1040 | --- lib/kernel/sources.mk 2012-06-04 12:15:18 +0000 | |||
1041 | +++ lib/kernel/sources.mk 2012-08-15 20:36:22 +0000 | |||
1042 | @@ -1,144 +1,146 @@ | |||
1043 | 1 | # Nodist here because these files should be included | 1 | # Nodist here because these files should be included |
1044 | 2 | # to the distribution only once, from the root kernel | 2 | # to the distribution only once, from the root kernel |
1045 | 3 | # makefile. | 3 | # makefile. |
1184 | 4 | nodist_libkernel_a_SOURCES = templates.h \ | 4 | nodist_libkernel_a_SOURCES = \ |
1185 | 5 | barrier.ll \ | 5 | templates.h \ |
1186 | 6 | image.h \ | 6 | barrier.ll \ |
1187 | 7 | get_work_dim.c \ | 7 | image.h \ |
1188 | 8 | get_global_size.c \ | 8 | get_work_dim.c \ |
1189 | 9 | get_global_id.c \ | 9 | get_global_size.c \ |
1190 | 10 | get_local_size.c \ | 10 | get_global_id.c \ |
1191 | 11 | get_local_id.c \ | 11 | get_local_size.c \ |
1192 | 12 | get_num_groups.c \ | 12 | get_local_id.c \ |
1193 | 13 | get_group_id.c \ | 13 | get_num_groups.c \ |
1194 | 14 | get_global_offset.c \ | 14 | get_group_id.c \ |
1195 | 15 | as_type.cl \ | 15 | get_global_offset.c \ |
1196 | 16 | convert_type.cl \ | 16 | as_type.cl \ |
1197 | 17 | acos.cl \ | 17 | atomics.cl \ |
1198 | 18 | acosh.cl \ | 18 | convert_type.cl \ |
1199 | 19 | acospi.cl \ | 19 | acos.cl \ |
1200 | 20 | asin.cl \ | 20 | acosh.cl \ |
1201 | 21 | asinh.cl \ | 21 | acospi.cl \ |
1202 | 22 | asinpi.cl \ | 22 | asin.cl \ |
1203 | 23 | atan.cl \ | 23 | asinh.cl \ |
1204 | 24 | atan2.cl \ | 24 | asinpi.cl \ |
1205 | 25 | atan2pi.cl \ | 25 | atan.cl \ |
1206 | 26 | atanh.cl \ | 26 | atan2.cl \ |
1207 | 27 | atanpi.cl \ | 27 | atan2pi.cl \ |
1208 | 28 | cbrt.cl \ | 28 | atanh.cl \ |
1209 | 29 | ceil.cl \ | 29 | atanpi.cl \ |
1210 | 30 | copysign.cl \ | 30 | cbrt.cl \ |
1211 | 31 | cos.cl \ | 31 | ceil.cl \ |
1212 | 32 | cosh.cl \ | 32 | copysign.cl \ |
1213 | 33 | cospi.cl \ | 33 | cos.cl \ |
1214 | 34 | erfc.cl \ | 34 | cosh.cl \ |
1215 | 35 | erf.cl \ | 35 | cospi.cl \ |
1216 | 36 | exp.cl \ | 36 | erfc.cl \ |
1217 | 37 | exp2.cl \ | 37 | erf.cl \ |
1218 | 38 | exp10.cl \ | 38 | exp.cl \ |
1219 | 39 | expm1.cl \ | 39 | exp2.cl \ |
1220 | 40 | fabs.cl \ | 40 | exp10.cl \ |
1221 | 41 | fdim.cl \ | 41 | expm1.cl \ |
1222 | 42 | floor.cl \ | 42 | fabs.cl \ |
1223 | 43 | fma.cl \ | 43 | fdim.cl \ |
1224 | 44 | fmax.cl \ | 44 | floor.cl \ |
1225 | 45 | fmin.cl \ | 45 | fma.cl \ |
1226 | 46 | fmod.cl \ | 46 | fmax.cl \ |
1227 | 47 | fract.cl \ | 47 | fmin.cl \ |
1228 | 48 | hypot.cl \ | 48 | fmod.cl \ |
1229 | 49 | ilogb.cl \ | 49 | fract.cl \ |
1230 | 50 | ldexp.cl \ | 50 | hypot.cl \ |
1231 | 51 | lgamma.cl \ | 51 | ilogb.cl \ |
1232 | 52 | log.cl \ | 52 | ldexp.cl \ |
1233 | 53 | log2.cl \ | 53 | lgamma.cl \ |
1234 | 54 | log10.cl \ | 54 | log.cl \ |
1235 | 55 | log1p.cl \ | 55 | log2.cl \ |
1236 | 56 | logb.cl \ | 56 | log10.cl \ |
1237 | 57 | mad.cl \ | 57 | log1p.cl \ |
1238 | 58 | maxmag.cl \ | 58 | logb.cl \ |
1239 | 59 | minmag.cl \ | 59 | mad.cl \ |
1240 | 60 | nan.cl \ | 60 | maxmag.cl \ |
1241 | 61 | nextafter.cl \ | 61 | minmag.cl \ |
1242 | 62 | pow.cl \ | 62 | nan.cl \ |
1243 | 63 | pown.cl \ | 63 | nextafter.cl \ |
1244 | 64 | powr.cl \ | 64 | pow.cl \ |
1245 | 65 | remainder.cl \ | 65 | pown.cl \ |
1246 | 66 | rint.cl \ | 66 | powr.cl \ |
1247 | 67 | rootn.cl \ | 67 | remainder.cl \ |
1248 | 68 | round.cl \ | 68 | rint.cl \ |
1249 | 69 | rsqrt.cl \ | 69 | rootn.cl \ |
1250 | 70 | sin.cl \ | 70 | round.cl \ |
1251 | 71 | sincos.cl \ | 71 | rsqrt.cl \ |
1252 | 72 | sinh.cl \ | 72 | sin.cl \ |
1253 | 73 | sinpi.cl \ | 73 | sincos.cl \ |
1254 | 74 | sqrt.cl \ | 74 | sinh.cl \ |
1255 | 75 | tan.cl \ | 75 | sinpi.cl \ |
1256 | 76 | tanh.cl \ | 76 | sqrt.cl \ |
1257 | 77 | tanpi.cl \ | 77 | tan.cl \ |
1258 | 78 | tgamma.cl \ | 78 | tanh.cl \ |
1259 | 79 | trunc.cl \ | 79 | tanpi.cl \ |
1260 | 80 | divide.cl \ | 80 | tgamma.cl \ |
1261 | 81 | recip.cl \ | 81 | trunc.cl \ |
1262 | 82 | abs.cl \ | 82 | divide.cl \ |
1263 | 83 | abs_diff.cl \ | 83 | recip.cl \ |
1264 | 84 | add_sat.cl \ | 84 | abs.cl \ |
1265 | 85 | hadd.cl \ | 85 | abs_diff.cl \ |
1266 | 86 | rhadd.cl \ | 86 | add_sat.cl \ |
1267 | 87 | clamp.cl \ | 87 | hadd.cl \ |
1268 | 88 | clz.cl \ | 88 | rhadd.cl \ |
1269 | 89 | mad_hi.cl \ | 89 | clamp.cl \ |
1270 | 90 | mad_sat.cl \ | 90 | clz.cl \ |
1271 | 91 | max.cl \ | 91 | mad_hi.cl \ |
1272 | 92 | min.cl \ | 92 | mad_sat.cl \ |
1273 | 93 | mul_hi.cl \ | 93 | max.cl \ |
1274 | 94 | rotate.cl \ | 94 | min.cl \ |
1275 | 95 | sub_sat.cl \ | 95 | mul_hi.cl \ |
1276 | 96 | upsample.cl \ | 96 | rotate.cl \ |
1277 | 97 | popcount.cl \ | 97 | sub_sat.cl \ |
1278 | 98 | mad24.cl \ | 98 | upsample.cl \ |
1279 | 99 | mul24.cl \ | 99 | popcount.cl \ |
1280 | 100 | degrees.cl \ | 100 | mad24.cl \ |
1281 | 101 | mix.cl \ | 101 | mul24.cl \ |
1282 | 102 | radians.cl \ | 102 | degrees.cl \ |
1283 | 103 | step.cl \ | 103 | mix.cl \ |
1284 | 104 | smoothstep.cl \ | 104 | radians.cl \ |
1285 | 105 | sign.cl \ | 105 | step.cl \ |
1286 | 106 | cross.cl \ | 106 | smoothstep.cl \ |
1287 | 107 | dot.cl \ | 107 | sign.cl \ |
1288 | 108 | distance.cl \ | 108 | cross.cl \ |
1289 | 109 | length.cl \ | 109 | dot.cl \ |
1290 | 110 | normalize.cl \ | 110 | distance.cl \ |
1291 | 111 | fast_distance.cl \ | 111 | length.cl \ |
1292 | 112 | fast_length.cl \ | 112 | normalize.cl \ |
1293 | 113 | fast_normalize.cl \ | 113 | fast_distance.cl \ |
1294 | 114 | isequal.cl \ | 114 | fast_length.cl \ |
1295 | 115 | isnotequal.cl \ | 115 | fast_normalize.cl \ |
1296 | 116 | isgreater.cl \ | 116 | isequal.cl \ |
1297 | 117 | isgreaterequal.cl \ | 117 | isnotequal.cl \ |
1298 | 118 | isless.cl \ | 118 | isgreater.cl \ |
1299 | 119 | islessequal.cl \ | 119 | isgreaterequal.cl \ |
1300 | 120 | islessgreater.cl \ | 120 | isless.cl \ |
1301 | 121 | isfinite.cl \ | 121 | islessequal.cl \ |
1302 | 122 | isinf.cl \ | 122 | islessgreater.cl \ |
1303 | 123 | isnan.cl \ | 123 | isfinite.cl \ |
1304 | 124 | isnormal.cl \ | 124 | isinf.cl \ |
1305 | 125 | isordered.cl \ | 125 | isnan.cl \ |
1306 | 126 | isunordered.cl \ | 126 | isnormal.cl \ |
1307 | 127 | signbit.cl \ | 127 | isordered.cl \ |
1308 | 128 | any.cl \ | 128 | isunordered.cl \ |
1309 | 129 | all.cl \ | 129 | signbit.cl \ |
1310 | 130 | bitselect.cl \ | 130 | any.cl \ |
1311 | 131 | select.cl \ | 131 | all.cl \ |
1312 | 132 | vload.cl \ | 132 | bitselect.cl \ |
1313 | 133 | vstore.cl \ | 133 | select.cl \ |
1314 | 134 | vload_half.cl \ | 134 | vload.cl \ |
1315 | 135 | vstore_half.cl \ | 135 | vstore.cl \ |
1316 | 136 | async_work_group_copy.cl \ | 136 | vload_half.cl \ |
1317 | 137 | wait_group_events.cl \ | 137 | vstore_half.cl \ |
1318 | 138 | read_image.cl \ | 138 | async_work_group_copy.cl \ |
1319 | 139 | write_image.cl \ | 139 | wait_group_events.cl \ |
1320 | 140 | get_image_width.cl \ | 140 | read_image.cl \ |
1321 | 141 | get_image_height.cl | 141 | write_image.cl \ |
1322 | 142 | get_image_width.cl \ | ||
1323 | 143 | get_image_height.cl | ||
1324 | 142 | 144 | ||
1325 | 143 | barrier.o: barrier.ll | 145 | barrier.o: barrier.ll |
1326 | 144 | $(LLVM_AS) -o $@ $< | 146 | $(LLVM_AS) -o $@ $< |
1327 | 145 | 147 | ||
1328 | === modified file 'lib/kernel/sub_sat.cl' | |||
1329 | --- lib/kernel/sub_sat.cl 2011-10-26 19:49:23 +0000 | |||
1330 | +++ lib/kernel/sub_sat.cl 2012-08-15 20:36:22 +0000 | |||
1331 | @@ -30,8 +30,6 @@ | |||
1332 | 30 | // ushort __builtin_ia32_psubusw128 | 30 | // ushort __builtin_ia32_psubusw128 |
1333 | 31 | // Other types don't seem to be supported. | 31 | // Other types don't seem to be supported. |
1334 | 32 | 32 | ||
1335 | 33 | // This could do with some testing | ||
1336 | 34 | // This could probably also be optimised (i.e. the ?: operators eliminated) | ||
1337 | 35 | DEFINE_EXPR_G_GG(sub_sat, | 33 | DEFINE_EXPR_G_GG(sub_sat, |
1338 | 36 | (sgtype)-1 < (sgtype)0 ? | 34 | (sgtype)-1 < (sgtype)0 ? |
1339 | 37 | /* signed */ | 35 | /* signed */ |
1340 | 38 | 36 | ||
1341 | === modified file 'lib/llvmopencl/Makefile.am' | |||
1342 | --- lib/llvmopencl/Makefile.am 2012-06-15 12:26:40 +0000 | |||
1343 | +++ lib/llvmopencl/Makefile.am 2012-08-15 20:36:22 +0000 | |||
1344 | @@ -40,4 +40,4 @@ | |||
1345 | 40 | ImplicitLoopBarriers.h ImplicitLoopBarriers.cc \ | 40 | ImplicitLoopBarriers.h ImplicitLoopBarriers.cc \ |
1346 | 41 | WorkItemAliasAnalysis.cc WIVectorize.cc | 41 | WorkItemAliasAnalysis.cc WIVectorize.cc |
1347 | 42 | 42 | ||
1348 | 43 | #llvmopencl_la_LIBADD = @LIBS_LLVMTRANSFORMUTILS@ | ||
1349 | 44 | \ No newline at end of file | 43 | \ No newline at end of file |
1350 | 44 | #llvmopencl_la_LIBADD = @LIBS_LLVMTRANSFORMUTILS@ | ||
1351 | 45 | 45 | ||
1352 | === modified file 'tests/regression/test_constant_array.cpp' | |||
1353 | --- tests/regression/test_constant_array.cpp 2012-08-02 14:18:02 +0000 | |||
1354 | +++ tests/regression/test_constant_array.cpp 2012-08-15 20:36:22 +0000 | |||
1355 | @@ -29,6 +29,7 @@ | |||
1356 | 29 | #include <cstdio> | 29 | #include <cstdio> |
1357 | 30 | #include <cstdlib> | 30 | #include <cstdlib> |
1358 | 31 | #include <iostream> | 31 | #include <iostream> |
1359 | 32 | #include <unistd.h> | ||
1360 | 32 | 33 | ||
1361 | 33 | #define WORK_ITEMS 1 | 34 | #define WORK_ITEMS 1 |
1362 | 34 | 35 | ||
1363 | 35 | 36 | ||
1364 | === modified file 'tests/regression/test_infinite_loop.cpp' | |||
1365 | --- tests/regression/test_infinite_loop.cpp 2012-08-02 14:18:02 +0000 | |||
1366 | +++ tests/regression/test_infinite_loop.cpp 2012-08-15 20:36:22 +0000 | |||
1367 | @@ -28,6 +28,7 @@ | |||
1368 | 28 | #include <cstdio> | 28 | #include <cstdio> |
1369 | 29 | #include <cstdlib> | 29 | #include <cstdlib> |
1370 | 30 | #include <iostream> | 30 | #include <iostream> |
1371 | 31 | #include <unistd.h> | ||
1372 | 31 | 32 | ||
1373 | 32 | #define WORK_ITEMS 1 | 33 | #define WORK_ITEMS 1 |
1374 | 33 | 34 |
Fixes should be committed to the pocl/0.6 branch first and then 0.6 merged to trunk so we get fixes to the 0.6 release. Otherwise, looks good to me. Please commit to the lp:pocl/0.6 branch directly (which we can merge to trunk later). I'm probably going to roll a yet another 0.6 release candidate package later this week.