Merge lp:~ams-codesourcery/gcc-linaro/unaligned-accesses-4.6 into lp:gcc-linaro/4.6
- unaligned-accesses-4.6
- Merge into 4.6
Status: | Rejected |
---|---|
Rejected by: | Michael Hope |
Proposed branch: | lp:~ams-codesourcery/gcc-linaro/unaligned-accesses-4.6 |
Merge into: | lp:gcc-linaro/4.6 |
Diff against target: |
868 lines (+660/-41) 7 files modified
ChangeLog.linaro (+44/-0) gcc/config/arm/arm-protos.h (+1/-0) gcc/config/arm/arm.c (+363/-3) gcc/config/arm/arm.md (+224/-36) gcc/config/arm/arm.opt (+4/-0) gcc/config/arm/constraints.md (+22/-0) gcc/expmed.c (+2/-2) |
To merge this branch: | bzr merge lp:~ams-codesourcery/gcc-linaro/unaligned-accesses-4.6 |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Michael Hope | Disapprove | ||
Richard Sandiford | Needs Fixing | ||
Review via email: mp+64957@code.launchpad.net |
Commit message
Description of the change
Backport of Julian Brown's unaligned accesses patches proposed for gcc 4.7, but not yet approved. Also required a backport of another prerequisite patch:
Original patches:
http://
http://<email address hidden>
http://<email address hidden>
Linaro Toolchain Builder (cbuild) wrote : | # |
Linaro Toolchain Builder (cbuild) wrote : | # |
cbuild successfully built this on i686-lucid-
The build results are available at:
http://
The test suite results were unchanged compared to the branch point lp:gcc-linaro/4.6+bzr106758.
The full testsuite results are at:
http://
cbuild-checked: i686-lucid-
Linaro Toolchain Builder (cbuild) wrote : | # |
cbuild successfully built this on x86_64-
The build results are available at:
http://
The test suite results changed compared to the branch point lp:gcc-linaro/4.6+bzr106758:
-PASS: gcc.dg/
+FAIL: gcc.dg/
The full testsuite results are at:
http://
cbuild-checked: x86_64-
Linaro Toolchain Builder (cbuild) wrote : | # |
cbuild successfully built this on armv7l-
The build results are available at:
http://
The test suite results were unchanged compared to the branch point lp:gcc-linaro/4.6+bzr106758.
The full testsuite results are at:
http://
cbuild-checked: armv7l-
Michael Hope (michaelh1) wrote : | # |
I'm re-running the x86_64 baseline and build under natty. The same problem showed up in Ramana's branch.
Linaro Toolchain Builder (cbuild) wrote : | # |
cbuild successfully built this on x86_64-
The build results are available at:
http://
The test suite results were unchanged compared to the branch point lp:gcc-linaro/4.6+bzr106758.
The full testsuite results are at:
http://
cbuild-checked: x86_64-
Michael Hope (michaelh1) wrote : | # |
Note that after re-running the baseline the regression has cleared.
Michael Hope (michaelh1) wrote : | # |
I ran a little test at -O2 on an A9 and the functionality look good. Taking this code:
"""
struct foo
{
char pad[2];
int bar;
char pad2[2];
int block[2];
}; // __attribute_
int get(struct foo* pfoo)
{
return pfoo->bar;
}
void copy(struct foo* pfoo, int* pinto)
{
__builtin_
}
void assign(struct foo* pfoo, struct foo* pinto)
{
*pinto = *pfoo;
}
"""
Without the packed:
* 4.6-2011.06 generates a load; a call to memcpy; and a ldmia/stmia
* This branch generates a load; a inline copy; and a ldmia/stmia
With the 'packed':
* 4.6-2011.06 generates a byte-by-byte load; a call to memcpy; and a call to memcpy
* This branch generates a load; a inline copy; and a ldr based inline copy
This branch with the packed and -march=armv5te -marm generates the right thing: byte-by-byte load; a call to memcpy; and a call to memcpy.
Ramana Radhakrishnan (ramana) wrote : | # |
The eabi attribute changes look ok to me.
cheers
Ramana
Ulrich Weigand (uweigand) wrote : | # |
The patch in general looks good to me, with the exception of the expmed.c common code change. I agree that common code must have a bug here, but I don't think this patch is actually a correct fix ... I'll need to look into this a bit more.
Ulrich Weigand (uweigand) wrote : | # |
The problematic code in expmed.c looks like this:
/* On big-endian machines, we count bits from the most significant.
If the bit field insn does not, we must invert. */
if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN)
xbitpos = unit - bitsize - xbitpos;
/* We have been counting XBITPOS within UNIT.
Count instead within the size of the register. */
if (BITS_BIG_ENDIAN && !MEM_P (xop0))
xbitpos += GET_MODE_BITSIZE (op_mode) - unit;
unit = GET_MODE_BITSIZE (op_mode);
It seems to be the problem is that the two corrections are performed in reverse order, and it should actually be like this (note that the first if must now check BYTES_BIT_ENDIAN, not BITS_BIG_ENDIAN):
/* We have been counting XBITPOS within UNIT.
Count instead within the size of the register. */
if (BYTES_BIG_ENDIAN && !MEM_P (xop0))
xbitpos += GET_MODE_BITSIZE (op_mode) - unit;
unit = GET_MODE_BITSIZE (op_mode);
/* On big-endian machines, we count bits from the most significant.
If the bit field insn does not, we must invert. */
if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN)
xbitpos = unit - bitsize - xbitpos;
This gives the same results as the previous code for all register operations, but now does the right thing for MEM operations on BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN machines ...
Ulrich Weigand (uweigand) wrote : | # |
Started discussing the expmed.c change on the mailing list here:
http://
Richard Sandiford (rsandifo) wrote : | # |
I think we should include this too:
http://
It'll need a bit of tweaking because 4.5 and 4.6 don't enforce
the movmisalign predicates. Something like:
if (!neon_
{
addr = force_reg (Pmode, XEXP (operands[0], 0));
operands[0] = replace_
}
at the end of the define_expand (and the same for operands[1]).
Michael Hope (michaelh1) wrote : | # |
Withdrawn. We'll see how Julian does upstream.
Unmerged revisions
- 106761. By Andrew Stubbs
-
Unaligned accesses for builtin memcpy.
Backport from proposed patch for FSF.
- 106760. By Andrew Stubbs
-
Unaligned accesses for packed types.
Backport from proposed FSF patch.
- 106759. By Andrew Stubbs
-
Backport prerequisite patch for unaligned access patches.
Backport from FSF.
Preview Diff
1 | === modified file 'ChangeLog.linaro' |
2 | --- ChangeLog.linaro 2011-06-14 14:09:57 +0000 |
3 | +++ ChangeLog.linaro 2011-06-17 10:04:51 +0000 |
4 | @@ -1,3 +1,47 @@ |
5 | +2011-06-17 Andrew Stubbs <ams@codesourcery.com> |
6 | + |
7 | + Backport proposed patches from gcc-patches@gcc.gnu.org: |
8 | + |
9 | + Julian Brown <julian@codesourcery.com> |
10 | + |
11 | + gcc/ |
12 | + * config/arm/arm.c (arm_block_move_unaligned_straight) |
13 | + (arm_adjust_block_mem, arm_block_move_unaligned_loop) |
14 | + (arm_movmemqi_unaligned): New. |
15 | + (arm_gen_movmemqi): Support unaligned block copies. |
16 | + |
17 | + Julian Brown <julian@codesourcery.com> |
18 | + |
19 | + gcc/ |
20 | + * config/arm/arm.c (arm_override_options): Add unaligned_access |
21 | + support. |
22 | + (arm_file_start): Emit attribute for unaligned access as |
23 | + appropriate. |
24 | + * config/arm/arm.md (UNSPEC_UNALIGNED_LOAD) |
25 | + (UNSPEC_UNALIGNED_STORE): Add constants for unspecs. |
26 | + (insv, extzv): Add unaligned-access support. |
27 | + (extv): Change to expander. Likewise. |
28 | + (unaligned_loadsi, unaligned_loadhis, unaligned_loadhiu) |
29 | + (unaligned_storesi, unaligned_storehi): New. |
30 | + (*extv_reg): New (previous extv implementation). |
31 | + * config/arm/arm.opt (munaligned_access): Add option. |
32 | + * config/arm/constraints.md (Uw): New constraint. |
33 | + * expmed.c (store_bit_field_1): Don't tweak bitfield numbering for |
34 | + memory locations if BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN. |
35 | + (extract_bit_field_1): Likewise. |
36 | + |
37 | + Backport from FSF: |
38 | + |
39 | + 2011-04-19 Wei Guozhi <carrot@google.com> |
40 | + |
41 | + PR target/47855 |
42 | + gcc/ |
43 | + * config/arm/arm-protos.h (thumb1_legitimate_address_p): New prototype. |
44 | + * config/arm/arm.c (thumb1_legitimate_address_p): Remove the static |
45 | + linkage. |
46 | + * config/arm/constraints.md (Uu): New constraint. |
47 | + * config/arm/arm.md (*arm_movqi_insn): Compute attr "length". |
48 | + |
49 | 2011-06-14 Andrew Stubbs <ams@codesourcery.com> |
50 | |
51 | gcc/ |
52 | |
53 | === modified file 'gcc/config/arm/arm-protos.h' |
54 | --- gcc/config/arm/arm-protos.h 2011-05-03 15:17:25 +0000 |
55 | +++ gcc/config/arm/arm-protos.h 2011-06-17 10:04:51 +0000 |
56 | @@ -58,6 +58,7 @@ |
57 | int); |
58 | extern rtx thumb_legitimize_reload_address (rtx *, enum machine_mode, int, int, |
59 | int); |
60 | +extern int thumb1_legitimate_address_p (enum machine_mode, rtx, int); |
61 | extern int arm_const_double_rtx (rtx); |
62 | extern int neg_const_double_rtx_ok_for_fpa (rtx); |
63 | extern int vfp3_const_double_rtx (rtx); |
64 | |
65 | === modified file 'gcc/config/arm/arm.c' |
66 | --- gcc/config/arm/arm.c 2011-05-11 14:49:48 +0000 |
67 | +++ gcc/config/arm/arm.c 2011-06-17 10:04:51 +0000 |
68 | @@ -1978,6 +1978,28 @@ |
69 | fix_cm3_ldrd = 0; |
70 | } |
71 | |
72 | + /* Enable -munaligned-access by default for |
73 | + - all ARMv6 architecture-based processors |
74 | + - ARMv7-A, ARMv7-R, and ARMv7-M architecture-based processors. |
75 | + |
76 | + Disable -munaligned-access by default for |
77 | + - all pre-ARMv6 architecture-based processors |
78 | + - ARMv6-M architecture-based processors. */ |
79 | + |
80 | + if (unaligned_access == 2) |
81 | + { |
82 | + if (arm_arch6 && (arm_arch_notm || arm_arch7)) |
83 | + unaligned_access = 1; |
84 | + else |
85 | + unaligned_access = 0; |
86 | + } |
87 | + else if (unaligned_access == 1 |
88 | + && !(arm_arch6 && (arm_arch_notm || arm_arch7))) |
89 | + { |
90 | + warning (0, "target CPU does not support unaligned accesses"); |
91 | + unaligned_access = 0; |
92 | + } |
93 | + |
94 | if (TARGET_THUMB1 && flag_schedule_insns) |
95 | { |
96 | /* Don't warn since it's on by default in -O2. */ |
97 | @@ -5929,7 +5951,7 @@ |
98 | addresses based on the frame pointer or arg pointer until the |
99 | reload pass starts. This is so that eliminating such addresses |
100 | into stack based ones won't produce impossible code. */ |
101 | -static int |
102 | +int |
103 | thumb1_legitimate_address_p (enum machine_mode mode, rtx x, int strict_p) |
104 | { |
105 | /* ??? Not clear if this is right. Experiment. */ |
106 | @@ -10385,6 +10407,335 @@ |
107 | return true; |
108 | } |
109 | |
110 | +/* Copy a block of memory using plain ldr/str/ldrh/strh instructions, to permit |
111 | + unaligned copies on processors which support unaligned semantics for those |
112 | + instructions. INTERLEAVE_FACTOR can be used to attempt to hide load latency |
113 | + (using more registers) by doing e.g. load/load/store/store for a factor of 2. |
114 | + An interleave factor of 1 (the minimum) will perform no interleaving. |
115 | + Load/store multiple are used for aligned addresses where possible. */ |
116 | + |
117 | +static void |
118 | +arm_block_move_unaligned_straight (rtx dstbase, rtx srcbase, |
119 | + HOST_WIDE_INT length, |
120 | + unsigned int interleave_factor) |
121 | +{ |
122 | + rtx *regs = XALLOCAVEC (rtx, interleave_factor); |
123 | + int *regnos = XALLOCAVEC (int, interleave_factor); |
124 | + HOST_WIDE_INT block_size_bytes = interleave_factor * UNITS_PER_WORD; |
125 | + HOST_WIDE_INT i, j; |
126 | + HOST_WIDE_INT remaining = length, words; |
127 | + rtx halfword_tmp = NULL, byte_tmp = NULL; |
128 | + rtx dst, src; |
129 | + bool src_aligned = MEM_ALIGN (srcbase) >= BITS_PER_WORD; |
130 | + bool dst_aligned = MEM_ALIGN (dstbase) >= BITS_PER_WORD; |
131 | + HOST_WIDE_INT srcoffset, dstoffset; |
132 | + HOST_WIDE_INT src_autoinc, dst_autoinc; |
133 | + rtx mem, addr; |
134 | + |
135 | + gcc_assert (1 <= interleave_factor && interleave_factor <= 4); |
136 | + |
137 | + /* Use hard registers if we have aligned source or destination so we can use |
138 | + load/store multiple with contiguous registers. */ |
139 | + if (dst_aligned || src_aligned) |
140 | + for (i = 0; i < interleave_factor; i++) |
141 | + regs[i] = gen_rtx_REG (SImode, i); |
142 | + else |
143 | + for (i = 0; i < interleave_factor; i++) |
144 | + regs[i] = gen_reg_rtx (SImode); |
145 | + |
146 | + dst = copy_addr_to_reg (XEXP (dstbase, 0)); |
147 | + src = copy_addr_to_reg (XEXP (srcbase, 0)); |
148 | + |
149 | + srcoffset = dstoffset = 0; |
150 | + |
151 | + /* Calls to arm_gen_load_multiple and arm_gen_store_multiple update SRC/DST. |
152 | + For copying the last bytes we want to subtract this offset again. */ |
153 | + src_autoinc = dst_autoinc = 0; |
154 | + |
155 | + for (i = 0; i < interleave_factor; i++) |
156 | + regnos[i] = i; |
157 | + |
158 | + /* Copy BLOCK_SIZE_BYTES chunks. */ |
159 | + |
160 | + for (i = 0; i + block_size_bytes <= length; i += block_size_bytes) |
161 | + { |
162 | + /* Load words. */ |
163 | + if (src_aligned && interleave_factor > 1) |
164 | + { |
165 | + emit_insn (arm_gen_load_multiple (regnos, interleave_factor, src, |
166 | + TRUE, srcbase, &srcoffset)); |
167 | + src_autoinc += UNITS_PER_WORD * interleave_factor; |
168 | + } |
169 | + else |
170 | + { |
171 | + for (j = 0; j < interleave_factor; j++) |
172 | + { |
173 | + addr = plus_constant (src, srcoffset + j * UNITS_PER_WORD |
174 | + - src_autoinc); |
175 | + mem = adjust_automodify_address (srcbase, SImode, addr, |
176 | + srcoffset + j * UNITS_PER_WORD); |
177 | + emit_insn (gen_unaligned_loadsi (regs[j], mem)); |
178 | + } |
179 | + srcoffset += block_size_bytes; |
180 | + } |
181 | + |
182 | + /* Store words. */ |
183 | + if (dst_aligned && interleave_factor > 1) |
184 | + { |
185 | + emit_insn (arm_gen_store_multiple (regnos, interleave_factor, dst, |
186 | + TRUE, dstbase, &dstoffset)); |
187 | + dst_autoinc += UNITS_PER_WORD * interleave_factor; |
188 | + } |
189 | + else |
190 | + { |
191 | + for (j = 0; j < interleave_factor; j++) |
192 | + { |
193 | + addr = plus_constant (dst, dstoffset + j * UNITS_PER_WORD |
194 | + - dst_autoinc); |
195 | + mem = adjust_automodify_address (dstbase, SImode, addr, |
196 | + dstoffset + j * UNITS_PER_WORD); |
197 | + emit_insn (gen_unaligned_storesi (mem, regs[j])); |
198 | + } |
199 | + dstoffset += block_size_bytes; |
200 | + } |
201 | + |
202 | + remaining -= block_size_bytes; |
203 | + } |
204 | + |
205 | + /* Copy any whole words left (note these aren't interleaved with any |
206 | + subsequent halfword/byte load/stores in the interests of simplicity). */ |
207 | + |
208 | + words = remaining / UNITS_PER_WORD; |
209 | + |
210 | + gcc_assert (words < interleave_factor); |
211 | + |
212 | + if (src_aligned && words > 1) |
213 | + { |
214 | + emit_insn (arm_gen_load_multiple (regnos, words, src, TRUE, srcbase, |
215 | + &srcoffset)); |
216 | + src_autoinc += UNITS_PER_WORD * words; |
217 | + } |
218 | + else |
219 | + { |
220 | + for (j = 0; j < words; j++) |
221 | + { |
222 | + addr = plus_constant (src, |
223 | + srcoffset + j * UNITS_PER_WORD - src_autoinc); |
224 | + mem = adjust_automodify_address (srcbase, SImode, addr, |
225 | + srcoffset + j * UNITS_PER_WORD); |
226 | + emit_insn (gen_unaligned_loadsi (regs[j], mem)); |
227 | + } |
228 | + srcoffset += words * UNITS_PER_WORD; |
229 | + } |
230 | + |
231 | + if (dst_aligned && words > 1) |
232 | + { |
233 | + emit_insn (arm_gen_store_multiple (regnos, words, dst, TRUE, dstbase, |
234 | + &dstoffset)); |
235 | + dst_autoinc += words * UNITS_PER_WORD; |
236 | + } |
237 | + else |
238 | + { |
239 | + for (j = 0; j < words; j++) |
240 | + { |
241 | + addr = plus_constant (dst, |
242 | + dstoffset + j * UNITS_PER_WORD - dst_autoinc); |
243 | + mem = adjust_automodify_address (dstbase, SImode, addr, |
244 | + dstoffset + j * UNITS_PER_WORD); |
245 | + emit_insn (gen_unaligned_storesi (mem, regs[j])); |
246 | + } |
247 | + dstoffset += words * UNITS_PER_WORD; |
248 | + } |
249 | + |
250 | + remaining -= words * UNITS_PER_WORD; |
251 | + |
252 | + gcc_assert (remaining < 4); |
253 | + |
254 | + /* Copy a halfword if necessary. */ |
255 | + |
256 | + if (remaining >= 2) |
257 | + { |
258 | + halfword_tmp = gen_reg_rtx (SImode); |
259 | + |
260 | + addr = plus_constant (src, srcoffset - src_autoinc); |
261 | + mem = adjust_automodify_address (srcbase, HImode, addr, srcoffset); |
262 | + emit_insn (gen_unaligned_loadhiu (halfword_tmp, mem)); |
263 | + |
264 | + /* Either write out immediately, or delay until we've loaded the last |
265 | + byte, depending on interleave factor. */ |
266 | + if (interleave_factor == 1) |
267 | + { |
268 | + addr = plus_constant (dst, dstoffset - dst_autoinc); |
269 | + mem = adjust_automodify_address (dstbase, HImode, addr, dstoffset); |
270 | + emit_insn (gen_unaligned_storehi (mem, |
271 | + gen_lowpart (HImode, halfword_tmp))); |
272 | + halfword_tmp = NULL; |
273 | + dstoffset += 2; |
274 | + } |
275 | + |
276 | + remaining -= 2; |
277 | + srcoffset += 2; |
278 | + } |
279 | + |
280 | + gcc_assert (remaining < 2); |
281 | + |
282 | + /* Copy last byte. */ |
283 | + |
284 | + if ((remaining & 1) != 0) |
285 | + { |
286 | + byte_tmp = gen_reg_rtx (SImode); |
287 | + |
288 | + addr = plus_constant (src, srcoffset - src_autoinc); |
289 | + mem = adjust_automodify_address (srcbase, QImode, addr, srcoffset); |
290 | + emit_move_insn (gen_lowpart (QImode, byte_tmp), mem); |
291 | + |
292 | + if (interleave_factor == 1) |
293 | + { |
294 | + addr = plus_constant (dst, dstoffset - dst_autoinc); |
295 | + mem = adjust_automodify_address (dstbase, QImode, addr, dstoffset); |
296 | + emit_move_insn (mem, gen_lowpart (QImode, byte_tmp)); |
297 | + byte_tmp = NULL; |
298 | + dstoffset++; |
299 | + } |
300 | + |
301 | + remaining--; |
302 | + srcoffset++; |
303 | + } |
304 | + |
305 | + /* Store last halfword if we haven't done so already. */ |
306 | + |
307 | + if (halfword_tmp) |
308 | + { |
309 | + addr = plus_constant (dst, dstoffset - dst_autoinc); |
310 | + mem = adjust_automodify_address (dstbase, HImode, addr, dstoffset); |
311 | + emit_insn (gen_unaligned_storehi (mem, |
312 | + gen_lowpart (HImode, halfword_tmp))); |
313 | + dstoffset += 2; |
314 | + } |
315 | + |
316 | + /* Likewise for last byte. */ |
317 | + |
318 | + if (byte_tmp) |
319 | + { |
320 | + addr = plus_constant (dst, dstoffset - dst_autoinc); |
321 | + mem = adjust_automodify_address (dstbase, QImode, addr, dstoffset); |
322 | + emit_move_insn (mem, gen_lowpart (QImode, byte_tmp)); |
323 | + dstoffset++; |
324 | + } |
325 | + |
326 | + gcc_assert (remaining == 0 && srcoffset == dstoffset); |
327 | +} |
328 | + |
329 | +/* From mips_adjust_block_mem: |
330 | + |
331 | + Helper function for doing a loop-based block operation on memory |
332 | + reference MEM. Each iteration of the loop will operate on LENGTH |
333 | + bytes of MEM. |
334 | + |
335 | + Create a new base register for use within the loop and point it to |
336 | + the start of MEM. Create a new memory reference that uses this |
337 | + register. Store them in *LOOP_REG and *LOOP_MEM respectively. */ |
338 | + |
339 | +static void |
340 | +arm_adjust_block_mem (rtx mem, HOST_WIDE_INT length, rtx *loop_reg, |
341 | + rtx *loop_mem) |
342 | +{ |
343 | + *loop_reg = copy_addr_to_reg (XEXP (mem, 0)); |
344 | + |
345 | + /* Although the new mem does not refer to a known location, |
346 | + it does keep up to LENGTH bytes of alignment. */ |
347 | + *loop_mem = change_address (mem, BLKmode, *loop_reg); |
348 | + set_mem_align (*loop_mem, MIN (MEM_ALIGN (mem), length * BITS_PER_UNIT)); |
349 | +} |
350 | + |
351 | +/* From mips_block_move_loop: |
352 | + |
353 | + Move LENGTH bytes from SRC to DEST using a loop that moves BYTES_PER_ITER |
354 | + bytes at a time. LENGTH must be at least BYTES_PER_ITER. Assume that |
355 | + the memory regions do not overlap. */ |
356 | + |
357 | +static void |
358 | +arm_block_move_unaligned_loop (rtx dest, rtx src, HOST_WIDE_INT length, |
359 | + unsigned int interleave_factor, |
360 | + HOST_WIDE_INT bytes_per_iter) |
361 | +{ |
362 | + rtx label, src_reg, dest_reg, final_src, test; |
363 | + HOST_WIDE_INT leftover; |
364 | + |
365 | + leftover = length % bytes_per_iter; |
366 | + length -= leftover; |
367 | + |
368 | + /* Create registers and memory references for use within the loop. */ |
369 | + arm_adjust_block_mem (src, bytes_per_iter, &src_reg, &src); |
370 | + arm_adjust_block_mem (dest, bytes_per_iter, &dest_reg, &dest); |
371 | + |
372 | + /* Calculate the value that SRC_REG should have after the last iteration of |
373 | + the loop. */ |
374 | + final_src = expand_simple_binop (Pmode, PLUS, src_reg, GEN_INT (length), |
375 | + 0, 0, OPTAB_WIDEN); |
376 | + |
377 | + /* Emit the start of the loop. */ |
378 | + label = gen_label_rtx (); |
379 | + emit_label (label); |
380 | + |
381 | + /* Emit the loop body. */ |
382 | + arm_block_move_unaligned_straight (dest, src, bytes_per_iter, |
383 | + interleave_factor); |
384 | + |
385 | + /* Move on to the next block. */ |
386 | + emit_move_insn (src_reg, plus_constant (src_reg, bytes_per_iter)); |
387 | + emit_move_insn (dest_reg, plus_constant (dest_reg, bytes_per_iter)); |
388 | + |
389 | + /* Emit the loop condition. */ |
390 | + test = gen_rtx_NE (VOIDmode, src_reg, final_src); |
391 | + emit_jump_insn (gen_cbranchsi4 (test, src_reg, final_src, label)); |
392 | + |
393 | + /* Mop up any left-over bytes. */ |
394 | + if (leftover) |
395 | + arm_block_move_unaligned_straight (dest, src, leftover, interleave_factor); |
396 | +} |
397 | + |
398 | +/* Emit a block move when either the source or destination is unaligned (not |
399 | + aligned to a four-byte boundary). This may need further tuning depending on |
400 | + core type, optimize_size setting, etc. */ |
401 | + |
402 | +static int |
403 | +arm_movmemqi_unaligned (rtx *operands) |
404 | +{ |
405 | + HOST_WIDE_INT length = INTVAL (operands[2]); |
406 | + |
407 | + if (optimize_size) |
408 | + { |
409 | + bool src_aligned = MEM_ALIGN (operands[1]) >= BITS_PER_WORD; |
410 | + bool dst_aligned = MEM_ALIGN (operands[0]) >= BITS_PER_WORD; |
411 | + /* Inlined memcpy using ldr/str/ldrh/strh can be quite big: try to limit |
412 | + size of code if optimizing for size. We'll use ldm/stm if src_aligned |
413 | + or dst_aligned though: allow more interleaving in those cases since the |
414 | + resulting code can be smaller. */ |
415 | + unsigned int interleave_factor = (src_aligned || dst_aligned) ? 2 : 1; |
416 | + HOST_WIDE_INT bytes_per_iter = (src_aligned || dst_aligned) ? 8 : 4; |
417 | + |
418 | + if (length > 12) |
419 | + arm_block_move_unaligned_loop (operands[0], operands[1], length, |
420 | + interleave_factor, bytes_per_iter); |
421 | + else |
422 | + arm_block_move_unaligned_straight (operands[0], operands[1], length, |
423 | + interleave_factor); |
424 | + } |
425 | + else |
426 | + { |
427 | + /* Note that the loop created by arm_block_move_unaligned_loop may be |
428 | + subject to loop unrolling, which makes tuning this condition a little |
429 | + redundant. */ |
430 | + if (length > 32) |
431 | + arm_block_move_unaligned_loop (operands[0], operands[1], length, 4, 16); |
432 | + else |
433 | + arm_block_move_unaligned_straight (operands[0], operands[1], length, 4); |
434 | + } |
435 | + |
436 | + return 1; |
437 | +} |
438 | + |
439 | int |
440 | arm_gen_movmemqi (rtx *operands) |
441 | { |
442 | @@ -10397,8 +10748,13 @@ |
443 | |
444 | if (GET_CODE (operands[2]) != CONST_INT |
445 | || GET_CODE (operands[3]) != CONST_INT |
446 | - || INTVAL (operands[2]) > 64 |
447 | - || INTVAL (operands[3]) & 3) |
448 | + || INTVAL (operands[2]) > 64) |
449 | + return 0; |
450 | + |
451 | + if (unaligned_access && (INTVAL (operands[3]) & 3) != 0) |
452 | + return arm_movmemqi_unaligned (operands); |
453 | + |
454 | + if (INTVAL (operands[3]) & 3) |
455 | return 0; |
456 | |
457 | dstbase = operands[0]; |
458 | @@ -21659,6 +22015,10 @@ |
459 | val = 6; |
460 | asm_fprintf (asm_out_file, "\t.eabi_attribute 30, %d\n", val); |
461 | |
462 | + /* Tag_CPU_unaligned_access. */ |
463 | + asm_fprintf (asm_out_file, "\t.eabi_attribute 34, %d\n", |
464 | + unaligned_access); |
465 | + |
466 | /* Tag_ABI_FP_16bit_format. */ |
467 | if (arm_fp16_format) |
468 | asm_fprintf (asm_out_file, "\t.eabi_attribute 38, %d\n", |
469 | |
470 | === modified file 'gcc/config/arm/arm.md' |
471 | --- gcc/config/arm/arm.md 2011-06-02 15:58:33 +0000 |
472 | +++ gcc/config/arm/arm.md 2011-06-17 10:04:51 +0000 |
473 | @@ -104,6 +104,10 @@ |
474 | (UNSPEC_SYMBOL_OFFSET 27) ; The offset of the start of the symbol from |
475 | ; another symbolic address. |
476 | (UNSPEC_MEMORY_BARRIER 28) ; Represent a memory barrier. |
477 | + (UNSPEC_UNALIGNED_LOAD 29) ; Used to represent ldr/ldrh instructions that access |
478 | + ; unaligned locations, on architectures which support |
479 | + ; that. |
480 | + (UNSPEC_UNALIGNED_STORE 30) ; Same for str/strh. |
481 | ] |
482 | ) |
483 | |
484 | @@ -2450,7 +2454,7 @@ |
485 | ;;; this insv pattern, so this pattern needs to be reevalutated. |
486 | |
487 | (define_expand "insv" |
488 | - [(set (zero_extract:SI (match_operand:SI 0 "s_register_operand" "") |
489 | + [(set (zero_extract:SI (match_operand:SI 0 "nonimmediate_operand" "") |
490 | (match_operand:SI 1 "general_operand" "") |
491 | (match_operand:SI 2 "general_operand" "")) |
492 | (match_operand:SI 3 "reg_or_int_operand" ""))] |
493 | @@ -2464,35 +2468,66 @@ |
494 | |
495 | if (arm_arch_thumb2) |
496 | { |
497 | - bool use_bfi = TRUE; |
498 | - |
499 | - if (GET_CODE (operands[3]) == CONST_INT) |
500 | - { |
501 | - HOST_WIDE_INT val = INTVAL (operands[3]) & mask; |
502 | - |
503 | - if (val == 0) |
504 | - { |
505 | - emit_insn (gen_insv_zero (operands[0], operands[1], |
506 | - operands[2])); |
507 | + if (unaligned_access && MEM_P (operands[0]) |
508 | + && s_register_operand (operands[3], GET_MODE (operands[3])) |
509 | + && (width == 16 || width == 32) && (start_bit % BITS_PER_UNIT) == 0) |
510 | + { |
511 | + rtx base_addr; |
512 | + |
513 | + if (width == 32) |
514 | + { |
515 | + base_addr = adjust_address (operands[0], SImode, |
516 | + start_bit / BITS_PER_UNIT); |
517 | + emit_insn (gen_unaligned_storesi (base_addr, operands[3])); |
518 | + } |
519 | + else |
520 | + { |
521 | + rtx tmp = gen_reg_rtx (HImode); |
522 | + |
523 | + base_addr = adjust_address (operands[0], HImode, |
524 | + start_bit / BITS_PER_UNIT); |
525 | + emit_move_insn (tmp, gen_lowpart (HImode, operands[3])); |
526 | + emit_insn (gen_unaligned_storehi (base_addr, tmp)); |
527 | + } |
528 | + DONE; |
529 | + } |
530 | + else if (s_register_operand (operands[0], GET_MODE (operands[0]))) |
531 | + { |
532 | + bool use_bfi = TRUE; |
533 | + |
534 | + if (GET_CODE (operands[3]) == CONST_INT) |
535 | + { |
536 | + HOST_WIDE_INT val = INTVAL (operands[3]) & mask; |
537 | + |
538 | + if (val == 0) |
539 | + { |
540 | + emit_insn (gen_insv_zero (operands[0], operands[1], |
541 | + operands[2])); |
542 | + DONE; |
543 | + } |
544 | + |
545 | + /* See if the set can be done with a single orr instruction. */ |
546 | + if (val == mask && const_ok_for_arm (val << start_bit)) |
547 | + use_bfi = FALSE; |
548 | + } |
549 | + |
550 | + if (use_bfi) |
551 | + { |
552 | + if (GET_CODE (operands[3]) != REG) |
553 | + operands[3] = force_reg (SImode, operands[3]); |
554 | + |
555 | + emit_insn (gen_insv_t2 (operands[0], operands[1], operands[2], |
556 | + operands[3])); |
557 | DONE; |
558 | } |
559 | - |
560 | - /* See if the set can be done with a single orr instruction. */ |
561 | - if (val == mask && const_ok_for_arm (val << start_bit)) |
562 | - use_bfi = FALSE; |
563 | - } |
564 | - |
565 | - if (use_bfi) |
566 | - { |
567 | - if (GET_CODE (operands[3]) != REG) |
568 | - operands[3] = force_reg (SImode, operands[3]); |
569 | - |
570 | - emit_insn (gen_insv_t2 (operands[0], operands[1], operands[2], |
571 | - operands[3])); |
572 | - DONE; |
573 | - } |
574 | + } |
575 | + else |
576 | + FAIL; |
577 | } |
578 | |
579 | + if (!s_register_operand (operands[0], GET_MODE (operands[0]))) |
580 | + FAIL; |
581 | + |
582 | target = copy_rtx (operands[0]); |
583 | /* Avoid using a subreg as a subtarget, and avoid writing a paradoxical |
584 | subreg as the final target. */ |
585 | @@ -3685,7 +3720,7 @@ |
586 | |
587 | (define_expand "extzv" |
588 | [(set (match_dup 4) |
589 | - (ashift:SI (match_operand:SI 1 "register_operand" "") |
590 | + (ashift:SI (match_operand:SI 1 "nonimmediate_operand" "") |
591 | (match_operand:SI 2 "const_int_operand" ""))) |
592 | (set (match_operand:SI 0 "register_operand" "") |
593 | (lshiftrt:SI (match_dup 4) |
594 | @@ -3698,10 +3733,53 @@ |
595 | |
596 | if (arm_arch_thumb2) |
597 | { |
598 | - emit_insn (gen_extzv_t2 (operands[0], operands[1], operands[2], |
599 | - operands[3])); |
600 | - DONE; |
601 | + HOST_WIDE_INT width = INTVAL (operands[2]); |
602 | + HOST_WIDE_INT bitpos = INTVAL (operands[3]); |
603 | + |
604 | + if (unaligned_access && MEM_P (operands[1]) |
605 | + && (width == 16 || width == 32) && (bitpos % BITS_PER_UNIT) == 0) |
606 | + { |
607 | + rtx base_addr; |
608 | + |
609 | + if (width == 32) |
610 | + { |
611 | + base_addr = adjust_address (operands[1], SImode, |
612 | + bitpos / BITS_PER_UNIT); |
613 | + emit_insn (gen_unaligned_loadsi (operands[0], base_addr)); |
614 | + } |
615 | + else |
616 | + { |
617 | + rtx dest = operands[0]; |
618 | + rtx tmp = gen_reg_rtx (SImode); |
619 | + |
620 | + /* We may get a paradoxical subreg here. Strip it off. */ |
621 | + if (GET_CODE (dest) == SUBREG |
622 | + && GET_MODE (dest) == SImode |
623 | + && GET_MODE (SUBREG_REG (dest)) == HImode) |
624 | + dest = SUBREG_REG (dest); |
625 | + |
626 | + if (GET_MODE_BITSIZE (GET_MODE (dest)) != width) |
627 | + FAIL; |
628 | + |
629 | + base_addr = adjust_address (operands[1], HImode, |
630 | + bitpos / BITS_PER_UNIT); |
631 | + emit_insn (gen_unaligned_loadhiu (tmp, base_addr)); |
632 | + emit_move_insn (gen_lowpart (SImode, dest), tmp); |
633 | + } |
634 | + DONE; |
635 | + } |
636 | + else if (s_register_operand (operands[1], GET_MODE (operands[1]))) |
637 | + { |
638 | + emit_insn (gen_extzv_t2 (operands[0], operands[1], operands[2], |
639 | + operands[3])); |
640 | + DONE; |
641 | + } |
642 | + else |
643 | + FAIL; |
644 | } |
645 | + |
646 | + if (!s_register_operand (operands[1], GET_MODE (operands[1]))) |
647 | + FAIL; |
648 | |
649 | operands[3] = GEN_INT (rshift); |
650 | |
651 | @@ -3716,7 +3794,113 @@ |
652 | }" |
653 | ) |
654 | |
655 | -(define_insn "extv" |
656 | +(define_expand "extv" |
657 | + [(set (match_operand:SI 0 "s_register_operand" "") |
658 | + (sign_extract:SI (match_operand:SI 1 "nonimmediate_operand" "") |
659 | + (match_operand:SI 2 "const_int_operand" "") |
660 | + (match_operand:SI 3 "const_int_operand" "")))] |
661 | + "arm_arch_thumb2" |
662 | +{ |
663 | + HOST_WIDE_INT width = INTVAL (operands[2]); |
664 | + HOST_WIDE_INT bitpos = INTVAL (operands[3]); |
665 | + |
666 | + if (unaligned_access && MEM_P (operands[1]) && (width == 16 || width == 32) |
667 | + && (bitpos % BITS_PER_UNIT) == 0) |
668 | + { |
669 | + rtx base_addr; |
670 | + |
671 | + if (width == 32) |
672 | + { |
673 | + base_addr = adjust_address (operands[1], SImode, |
674 | + bitpos / BITS_PER_UNIT); |
675 | + emit_insn (gen_unaligned_loadsi (operands[0], base_addr)); |
676 | + } |
677 | + else |
678 | + { |
679 | + rtx dest = operands[0]; |
680 | + rtx tmp = gen_reg_rtx (SImode); |
681 | + |
682 | + /* We may get a paradoxical subreg here. Strip it off. */ |
683 | + if (GET_CODE (dest) == SUBREG |
684 | + && GET_MODE (dest) == SImode |
685 | + && GET_MODE (SUBREG_REG (dest)) == HImode) |
686 | + dest = SUBREG_REG (dest); |
687 | + |
688 | + if (GET_MODE_BITSIZE (GET_MODE (dest)) != width) |
689 | + FAIL; |
690 | + |
691 | + base_addr = adjust_address (operands[1], HImode, |
692 | + bitpos / BITS_PER_UNIT); |
693 | + emit_insn (gen_unaligned_loadhis (tmp, base_addr)); |
694 | + emit_move_insn (gen_lowpart (SImode, dest), tmp); |
695 | + } |
696 | + |
697 | + DONE; |
698 | + } |
699 | + else if (!s_register_operand (operands[1], GET_MODE (operands[1]))) |
700 | + FAIL; |
701 | +}) |
702 | + |
703 | +; ARMv6+ unaligned load/store instructions (used for packed structure accesses). |
704 | + |
705 | +(define_insn "unaligned_loadsi" |
706 | + [(set (match_operand:SI 0 "s_register_operand" "=l,r") |
707 | + (unspec:SI [(match_operand:SI 1 "memory_operand" "Uw,m")] |
708 | + UNSPEC_UNALIGNED_LOAD))] |
709 | + "unaligned_access && TARGET_32BIT" |
710 | + "ldr%?\t%0, %1\t@ unaligned" |
711 | + [(set_attr "arch" "t2,any") |
712 | + (set_attr "length" "2,4") |
713 | + (set_attr "predicable" "yes") |
714 | + (set_attr "type" "load1")]) |
715 | + |
716 | +(define_insn "unaligned_loadhis" |
717 | + [(set (match_operand:SI 0 "s_register_operand" "=l,r") |
718 | + (sign_extend:SI |
719 | + (unspec:HI [(match_operand:HI 1 "memory_operand" "Uw,m")] |
720 | + UNSPEC_UNALIGNED_LOAD)))] |
721 | + "unaligned_access && TARGET_32BIT" |
722 | + "ldr%(sh%)\t%0, %1\t@ unaligned" |
723 | + [(set_attr "arch" "t2,any") |
724 | + (set_attr "length" "2,4") |
725 | + (set_attr "predicable" "yes") |
726 | + (set_attr "type" "load_byte")]) |
727 | + |
728 | +(define_insn "unaligned_loadhiu" |
729 | + [(set (match_operand:SI 0 "s_register_operand" "=l,r") |
730 | + (zero_extend:SI |
731 | + (unspec:HI [(match_operand:HI 1 "memory_operand" "Uw,m")] |
732 | + UNSPEC_UNALIGNED_LOAD)))] |
733 | + "unaligned_access && TARGET_32BIT" |
734 | + "ldr%(h%)\t%0, %1\t@ unaligned" |
735 | + [(set_attr "arch" "t2,any") |
736 | + (set_attr "length" "2,4") |
737 | + (set_attr "predicable" "yes") |
738 | + (set_attr "type" "load_byte")]) |
739 | + |
740 | +(define_insn "unaligned_storesi" |
741 | + [(set (match_operand:SI 0 "memory_operand" "=Uw,m") |
742 | + (unspec:SI [(match_operand:SI 1 "s_register_operand" "l,r")] |
743 | + UNSPEC_UNALIGNED_STORE))] |
744 | + "unaligned_access && TARGET_32BIT" |
745 | + "str%?\t%1, %0\t@ unaligned" |
746 | + [(set_attr "arch" "t2,any") |
747 | + (set_attr "length" "2,4") |
748 | + (set_attr "predicable" "yes") |
749 | + (set_attr "type" "store1")]) |
750 | + |
751 | +(define_insn "unaligned_storehi" |
752 | + [(set (match_operand:HI 0 "memory_operand" "=Uw,m") |
753 | + (unspec:HI [(match_operand:HI 1 "s_register_operand" "l,r")] |
754 | + UNSPEC_UNALIGNED_STORE))] |
755 | + "unaligned_access && TARGET_32BIT" |
756 | + "str%(h%)\t%1, %0\t@ unaligned" |
757 | + [(set_attr "arch" "t2,any") |
758 | + (set_attr "length" "2,4") |
759 | + (set_attr "predicable" "yes") |
760 | + (set_attr "type" "store1")]) |
761 | + |
762 | +(define_insn "*extv_reg" |
763 | [(set (match_operand:SI 0 "s_register_operand" "=r") |
764 | (sign_extract:SI (match_operand:SI 1 "s_register_operand" "r") |
765 | (match_operand:SI 2 "const_int_operand" "M") |
766 | @@ -6003,8 +6187,8 @@ |
767 | |
768 | |
769 | (define_insn "*arm_movqi_insn" |
770 | - [(set (match_operand:QI 0 "nonimmediate_operand" "=r,r,r,m") |
771 | - (match_operand:QI 1 "general_operand" "rI,K,m,r"))] |
772 | + [(set (match_operand:QI 0 "nonimmediate_operand" "=r,r,l,Uu,r,m") |
773 | + (match_operand:QI 1 "general_operand" "rI,K,Uu,l,m,r"))] |
774 | "TARGET_32BIT |
775 | && ( register_operand (operands[0], QImode) |
776 | || register_operand (operands[1], QImode))" |
777 | @@ -6012,10 +6196,14 @@ |
778 | mov%?\\t%0, %1 |
779 | mvn%?\\t%0, #%B1 |
780 | ldr%(b%)\\t%0, %1 |
781 | + str%(b%)\\t%1, %0 |
782 | + ldr%(b%)\\t%0, %1 |
783 | str%(b%)\\t%1, %0" |
784 | - [(set_attr "type" "*,*,load1,store1") |
785 | - (set_attr "insn" "mov,mvn,*,*") |
786 | - (set_attr "predicable" "yes")] |
787 | + [(set_attr "type" "*,*,load1,store1,load1,store1") |
788 | + (set_attr "insn" "mov,mvn,*,*,*,*") |
789 | + (set_attr "predicable" "yes") |
790 | + (set_attr "arch" "any,any,t2,t2,any,any") |
791 | + (set_attr "length" "4,4,2,2,4,4")] |
792 | ) |
793 | |
794 | (define_insn "*thumb1_movqi_insn" |
795 | |
796 | === modified file 'gcc/config/arm/arm.opt' |
797 | --- gcc/config/arm/arm.opt 2009-06-18 11:24:10 +0000 |
798 | +++ gcc/config/arm/arm.opt 2011-06-17 10:04:51 +0000 |
799 | @@ -169,3 +169,7 @@ |
800 | Target Report Var(fix_cm3_ldrd) Init(2) |
801 | Avoid overlapping destination and address registers on LDRD instructions |
802 | that may trigger Cortex-M3 errata. |
803 | + |
804 | +munaligned-access |
805 | +Target Report Var(unaligned_access) Init(2) |
806 | +Enable unaligned word and halfword accesses to packed data. |
807 | |
808 | === modified file 'gcc/config/arm/constraints.md' |
809 | --- gcc/config/arm/constraints.md 2011-01-03 20:52:22 +0000 |
810 | +++ gcc/config/arm/constraints.md 2011-06-17 10:04:51 +0000 |
811 | @@ -36,6 +36,7 @@ |
812 | ;; The following memory constraints have been used: |
813 | ;; in ARM/Thumb-2 state: Q, Ut, Uv, Uy, Un, Um, Us |
814 | ;; in ARM state: Uq |
815 | +;; in Thumb state: Uu, Uw |
816 | |
817 | |
818 | (define_register_constraint "f" "TARGET_ARM ? FPA_REGS : NO_REGS" |
819 | @@ -327,6 +328,27 @@ |
820 | (and (match_code "mem") |
821 | (match_test "REG_P (XEXP (op, 0))"))) |
822 | |
823 | +(define_memory_constraint "Uu" |
824 | + "@internal |
825 | + In Thumb state an address that is valid in 16bit encoding." |
826 | + (and (match_code "mem") |
827 | + (match_test "TARGET_THUMB |
828 | + && thumb1_legitimate_address_p (GET_MODE (op), XEXP (op, 0), |
829 | + 0)"))) |
830 | + |
831 | +; The 16-bit post-increment LDR/STR accepted by thumb1_legitimate_address_p |
832 | +; are actually LDM/STM instructions, so cannot be used to access unaligned |
833 | +; data. |
834 | +(define_memory_constraint "Uw" |
835 | + "@internal |
836 | + In Thumb state an address that is valid in 16bit encoding, and that can be |
837 | + used for unaligned accesses." |
838 | + (and (match_code "mem") |
839 | + (match_test "TARGET_THUMB |
840 | + && thumb1_legitimate_address_p (GET_MODE (op), XEXP (op, 0), |
841 | + 0) |
842 | + && GET_CODE (XEXP (op, 0)) != POST_INC"))) |
843 | + |
844 | ;; We used to have constraint letters for S and R in ARM state, but |
845 | ;; all uses of these now appear to have been removed. |
846 | |
847 | |
848 | === modified file 'gcc/expmed.c' |
849 | --- gcc/expmed.c 2011-05-22 19:02:59 +0000 |
850 | +++ gcc/expmed.c 2011-06-17 10:04:51 +0000 |
851 | @@ -703,7 +703,7 @@ |
852 | /* On big-endian machines, we count bits from the most significant. |
853 | If the bit field insn does not, we must invert. */ |
854 | |
855 | - if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN) |
856 | + if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN && !MEM_P (xop0)) |
857 | xbitpos = unit - bitsize - xbitpos; |
858 | |
859 | /* We have been counting XBITPOS within UNIT. |
860 | @@ -1554,7 +1554,7 @@ |
861 | |
862 | /* On big-endian machines, we count bits from the most significant. |
863 | If the bit field insn does not, we must invert. */ |
864 | - if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN) |
865 | + if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN && !MEM_P (xop0)) |
866 | xbitpos = unit - bitsize - xbitpos; |
867 | |
868 | /* Now convert from counting within UNIT to counting in EXT_MODE. */ |
cbuild has taken a snapshot of this branch at r106761 and queued it for build.
The snapshot is available at: ex.seabright. co.nz/snapshots /gcc-linaro- 4.6+bzr106761~ ams-codesourcer y~unaligned- accesses- 4.6.tar. xdelta3. xz
http://
and will be built on the following builders:
a9-builder armv5-builder i686 x86_64
You can track the build queue at: ex.seabright. co.nz/helpers/ scheduler
http://
cbuild-snapshot: gcc-linaro- 4.6+bzr106761~ ams-codesourcer y~unaligned- accesses- 4.6
cbuild-ancestor: lp:gcc-linaro/4.6+bzr106758
cbuild-state: check