Merge lp:~ams-codesourcery/gcc-linaro/unaligned-accesses-4.6 into lp:gcc-linaro/4.6

Proposed by Andrew Stubbs
Status: Rejected
Rejected by: Michael Hope
Proposed branch: lp:~ams-codesourcery/gcc-linaro/unaligned-accesses-4.6
Merge into: lp:gcc-linaro/4.6
Diff against target: 868 lines (+660/-41)
7 files modified
ChangeLog.linaro (+44/-0)
gcc/config/arm/arm-protos.h (+1/-0)
gcc/config/arm/arm.c (+363/-3)
gcc/config/arm/arm.md (+224/-36)
gcc/config/arm/arm.opt (+4/-0)
gcc/config/arm/constraints.md (+22/-0)
gcc/expmed.c (+2/-2)
To merge this branch: bzr merge lp:~ams-codesourcery/gcc-linaro/unaligned-accesses-4.6
Reviewer Review Type Date Requested Status
Michael Hope Disapprove
Richard Sandiford Needs Fixing
Review via email: mp+64957@code.launchpad.net

Description of the change

Backport of Julian Brown's unaligned accesses patches proposed for gcc 4.7, but not yet approved. Also required a backport of another prerequisite patch:

Original patches:
  http://gcc.gnu.org/viewcvs?view=revision&revision=172697
  http://<email address hidden>/msg06575.html
  http://<email address hidden>/msg05062.html

To post a comment you must log in.
Revision history for this message
Linaro Toolchain Builder (cbuild) wrote :

cbuild has taken a snapshot of this branch at r106761 and queued it for build.

The snapshot is available at:
 http://ex.seabright.co.nz/snapshots/gcc-linaro-4.6+bzr106761~ams-codesourcery~unaligned-accesses-4.6.tar.xdelta3.xz

and will be built on the following builders:
 a9-builder armv5-builder i686 x86_64

You can track the build queue at:
 http://ex.seabright.co.nz/helpers/scheduler

cbuild-snapshot: gcc-linaro-4.6+bzr106761~ams-codesourcery~unaligned-accesses-4.6
cbuild-ancestor: lp:gcc-linaro/4.6+bzr106758
cbuild-state: check

Revision history for this message
Linaro Toolchain Builder (cbuild) wrote :

cbuild successfully built this on i686-lucid-cbuild132-scorpius-i686r1.

The build results are available at:
 http://ex.seabright.co.nz/build/gcc-linaro-4.6+bzr106761~ams-codesourcery~unaligned-accesses-4.6/logs/i686-lucid-cbuild132-scorpius-i686r1

The test suite results were unchanged compared to the branch point lp:gcc-linaro/4.6+bzr106758.

The full testsuite results are at:
 http://ex.seabright.co.nz/build/gcc-linaro-4.6+bzr106761~ams-codesourcery~unaligned-accesses-4.6/logs/i686-lucid-cbuild132-scorpius-i686r1/gcc-testsuite.txt

cbuild-checked: i686-lucid-cbuild132-scorpius-i686r1

Revision history for this message
Linaro Toolchain Builder (cbuild) wrote :

cbuild successfully built this on x86_64-maverick-cbuild132-crucis-x86_64r1.

The build results are available at:
 http://ex.seabright.co.nz/build/gcc-linaro-4.6+bzr106761~ams-codesourcery~unaligned-accesses-4.6/logs/x86_64-maverick-cbuild132-crucis-x86_64r1

The test suite results changed compared to the branch point lp:gcc-linaro/4.6+bzr106758:
 -PASS: gcc.dg/vect/slp-reduc-5.c -flto execution test
 +FAIL: gcc.dg/vect/slp-reduc-5.c -flto execution test

The full testsuite results are at:
 http://ex.seabright.co.nz/build/gcc-linaro-4.6+bzr106761~ams-codesourcery~unaligned-accesses-4.6/logs/x86_64-maverick-cbuild132-crucis-x86_64r1/gcc-testsuite.txt

cbuild-checked: x86_64-maverick-cbuild132-crucis-x86_64r1

Revision history for this message
Linaro Toolchain Builder (cbuild) wrote :

cbuild successfully built this on armv7l-natty-cbuild135-ursa4-cortexa9r1.

The build results are available at:
 http://ex.seabright.co.nz/build/gcc-linaro-4.6+bzr106761~ams-codesourcery~unaligned-accesses-4.6/logs/armv7l-natty-cbuild135-ursa4-cortexa9r1

The test suite results were unchanged compared to the branch point lp:gcc-linaro/4.6+bzr106758.

The full testsuite results are at:
 http://ex.seabright.co.nz/build/gcc-linaro-4.6+bzr106761~ams-codesourcery~unaligned-accesses-4.6/logs/armv7l-natty-cbuild135-ursa4-cortexa9r1/gcc-testsuite.txt

cbuild-checked: armv7l-natty-cbuild135-ursa4-cortexa9r1

Revision history for this message
Michael Hope (michaelh1) wrote :

I'm re-running the x86_64 baseline and build under natty. The same problem showed up in Ramana's branch.

Revision history for this message
Linaro Toolchain Builder (cbuild) wrote :

cbuild successfully built this on x86_64-natty-cbuild136-crucis-x86_64r1.

The build results are available at:
 http://ex.seabright.co.nz/build/gcc-linaro-4.6+bzr106761~ams-codesourcery~unaligned-accesses-4.6/logs/x86_64-natty-cbuild136-crucis-x86_64r1

The test suite results were unchanged compared to the branch point lp:gcc-linaro/4.6+bzr106758.

The full testsuite results are at:
 http://ex.seabright.co.nz/build/gcc-linaro-4.6+bzr106761~ams-codesourcery~unaligned-accesses-4.6/logs/x86_64-natty-cbuild136-crucis-x86_64r1/gcc-testsuite.txt

cbuild-checked: x86_64-natty-cbuild136-crucis-x86_64r1

Revision history for this message
Michael Hope (michaelh1) wrote :

Note that after re-running the baseline the regression has cleared.

Revision history for this message
Michael Hope (michaelh1) wrote :

I ran a little test at -O2 on an A9 and the functionality look good. Taking this code:

"""
struct foo
{
    char pad[2];
    int bar;
    char pad2[2];
    int block[2];
}; // __attribute__((packed));

int get(struct foo* pfoo)
{
    return pfoo->bar;
}

void copy(struct foo* pfoo, int* pinto)
{
    __builtin_memcpy(pinto, pfoo->block, sizeof(pfoo->block));
}

void assign(struct foo* pfoo, struct foo* pinto)
{
    *pinto = *pfoo;
}
"""

Without the packed:
 * 4.6-2011.06 generates a load; a call to memcpy; and a ldmia/stmia
 * This branch generates a load; a inline copy; and a ldmia/stmia

With the 'packed':
 * 4.6-2011.06 generates a byte-by-byte load; a call to memcpy; and a call to memcpy
 * This branch generates a load; a inline copy; and a ldr based inline copy

This branch with the packed and -march=armv5te -marm generates the right thing: byte-by-byte load; a call to memcpy; and a call to memcpy.

Revision history for this message
Ramana Radhakrishnan (ramana) wrote :

The eabi attribute changes look ok to me.

cheers
Ramana

Revision history for this message
Ulrich Weigand (uweigand) wrote :

The patch in general looks good to me, with the exception of the expmed.c common code change. I agree that common code must have a bug here, but I don't think this patch is actually a correct fix ... I'll need to look into this a bit more.

Revision history for this message
Ulrich Weigand (uweigand) wrote :

The problematic code in expmed.c looks like this:

      /* On big-endian machines, we count bits from the most significant.
         If the bit field insn does not, we must invert. */

      if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN)
        xbitpos = unit - bitsize - xbitpos;

      /* We have been counting XBITPOS within UNIT.
         Count instead within the size of the register. */
      if (BITS_BIG_ENDIAN && !MEM_P (xop0))
        xbitpos += GET_MODE_BITSIZE (op_mode) - unit;

      unit = GET_MODE_BITSIZE (op_mode);

It seems to be the problem is that the two corrections are performed in reverse order, and it should actually be like this (note that the first if must now check BYTES_BIT_ENDIAN, not BITS_BIG_ENDIAN):

      /* We have been counting XBITPOS within UNIT.
         Count instead within the size of the register. */
      if (BYTES_BIG_ENDIAN && !MEM_P (xop0))
        xbitpos += GET_MODE_BITSIZE (op_mode) - unit;

      unit = GET_MODE_BITSIZE (op_mode);

      /* On big-endian machines, we count bits from the most significant.
         If the bit field insn does not, we must invert. */

      if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN)
        xbitpos = unit - bitsize - xbitpos;

This gives the same results as the previous code for all register operations, but now does the right thing for MEM operations on BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN machines ...

Revision history for this message
Ulrich Weigand (uweigand) wrote :

Started discussing the expmed.c change on the mailing list here:
http://gcc.gnu.org/ml/gcc-patches/2011-07/msg00174.html

Revision history for this message
Richard Sandiford (rsandifo) wrote :

I think we should include this too:

    http://gcc.gnu.org/ml/gcc-patches/2011-07/msg00804.html

It'll need a bit of tweaking because 4.5 and 4.6 don't enforce
the movmisalign predicates. Something like:

    if (!neon_struct_operand (operands[0], <MODE>mode))
      {
        addr = force_reg (Pmode, XEXP (operands[0], 0));
        operands[0] = replace_equiv_address (operands[0], addr);
      }

at the end of the define_expand (and the same for operands[1]).

review: Needs Fixing
Revision history for this message
Michael Hope (michaelh1) wrote :

Withdrawn. We'll see how Julian does upstream.

review: Disapprove

Unmerged revisions

106761. By Andrew Stubbs

Unaligned accesses for builtin memcpy.

Backport from proposed patch for FSF.

106760. By Andrew Stubbs

Unaligned accesses for packed types.

Backport from proposed FSF patch.

106759. By Andrew Stubbs

Backport prerequisite patch for unaligned access patches.

Backport from FSF.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'ChangeLog.linaro'
2--- ChangeLog.linaro 2011-06-14 14:09:57 +0000
3+++ ChangeLog.linaro 2011-06-17 10:04:51 +0000
4@@ -1,3 +1,47 @@
5+2011-06-17 Andrew Stubbs <ams@codesourcery.com>
6+
7+ Backport proposed patches from gcc-patches@gcc.gnu.org:
8+
9+ Julian Brown <julian@codesourcery.com>
10+
11+ gcc/
12+ * config/arm/arm.c (arm_block_move_unaligned_straight)
13+ (arm_adjust_block_mem, arm_block_move_unaligned_loop)
14+ (arm_movmemqi_unaligned): New.
15+ (arm_gen_movmemqi): Support unaligned block copies.
16+
17+ Julian Brown <julian@codesourcery.com>
18+
19+ gcc/
20+ * config/arm/arm.c (arm_override_options): Add unaligned_access
21+ support.
22+ (arm_file_start): Emit attribute for unaligned access as
23+ appropriate.
24+ * config/arm/arm.md (UNSPEC_UNALIGNED_LOAD)
25+ (UNSPEC_UNALIGNED_STORE): Add constants for unspecs.
26+ (insv, extzv): Add unaligned-access support.
27+ (extv): Change to expander. Likewise.
28+ (unaligned_loadsi, unaligned_loadhis, unaligned_loadhiu)
29+ (unaligned_storesi, unaligned_storehi): New.
30+ (*extv_reg): New (previous extv implementation).
31+ * config/arm/arm.opt (munaligned_access): Add option.
32+ * config/arm/constraints.md (Uw): New constraint.
33+ * expmed.c (store_bit_field_1): Don't tweak bitfield numbering for
34+ memory locations if BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN.
35+ (extract_bit_field_1): Likewise.
36+
37+ Backport from FSF:
38+
39+ 2011-04-19 Wei Guozhi <carrot@google.com>
40+
41+ PR target/47855
42+ gcc/
43+ * config/arm/arm-protos.h (thumb1_legitimate_address_p): New prototype.
44+ * config/arm/arm.c (thumb1_legitimate_address_p): Remove the static
45+ linkage.
46+ * config/arm/constraints.md (Uu): New constraint.
47+ * config/arm/arm.md (*arm_movqi_insn): Compute attr "length".
48+
49 2011-06-14 Andrew Stubbs <ams@codesourcery.com>
50
51 gcc/
52
53=== modified file 'gcc/config/arm/arm-protos.h'
54--- gcc/config/arm/arm-protos.h 2011-05-03 15:17:25 +0000
55+++ gcc/config/arm/arm-protos.h 2011-06-17 10:04:51 +0000
56@@ -58,6 +58,7 @@
57 int);
58 extern rtx thumb_legitimize_reload_address (rtx *, enum machine_mode, int, int,
59 int);
60+extern int thumb1_legitimate_address_p (enum machine_mode, rtx, int);
61 extern int arm_const_double_rtx (rtx);
62 extern int neg_const_double_rtx_ok_for_fpa (rtx);
63 extern int vfp3_const_double_rtx (rtx);
64
65=== modified file 'gcc/config/arm/arm.c'
66--- gcc/config/arm/arm.c 2011-05-11 14:49:48 +0000
67+++ gcc/config/arm/arm.c 2011-06-17 10:04:51 +0000
68@@ -1978,6 +1978,28 @@
69 fix_cm3_ldrd = 0;
70 }
71
72+ /* Enable -munaligned-access by default for
73+ - all ARMv6 architecture-based processors
74+ - ARMv7-A, ARMv7-R, and ARMv7-M architecture-based processors.
75+
76+ Disable -munaligned-access by default for
77+ - all pre-ARMv6 architecture-based processors
78+ - ARMv6-M architecture-based processors. */
79+
80+ if (unaligned_access == 2)
81+ {
82+ if (arm_arch6 && (arm_arch_notm || arm_arch7))
83+ unaligned_access = 1;
84+ else
85+ unaligned_access = 0;
86+ }
87+ else if (unaligned_access == 1
88+ && !(arm_arch6 && (arm_arch_notm || arm_arch7)))
89+ {
90+ warning (0, "target CPU does not support unaligned accesses");
91+ unaligned_access = 0;
92+ }
93+
94 if (TARGET_THUMB1 && flag_schedule_insns)
95 {
96 /* Don't warn since it's on by default in -O2. */
97@@ -5929,7 +5951,7 @@
98 addresses based on the frame pointer or arg pointer until the
99 reload pass starts. This is so that eliminating such addresses
100 into stack based ones won't produce impossible code. */
101-static int
102+int
103 thumb1_legitimate_address_p (enum machine_mode mode, rtx x, int strict_p)
104 {
105 /* ??? Not clear if this is right. Experiment. */
106@@ -10385,6 +10407,335 @@
107 return true;
108 }
109
110+/* Copy a block of memory using plain ldr/str/ldrh/strh instructions, to permit
111+ unaligned copies on processors which support unaligned semantics for those
112+ instructions. INTERLEAVE_FACTOR can be used to attempt to hide load latency
113+ (using more registers) by doing e.g. load/load/store/store for a factor of 2.
114+ An interleave factor of 1 (the minimum) will perform no interleaving.
115+ Load/store multiple are used for aligned addresses where possible. */
116+
117+static void
118+arm_block_move_unaligned_straight (rtx dstbase, rtx srcbase,
119+ HOST_WIDE_INT length,
120+ unsigned int interleave_factor)
121+{
122+ rtx *regs = XALLOCAVEC (rtx, interleave_factor);
123+ int *regnos = XALLOCAVEC (int, interleave_factor);
124+ HOST_WIDE_INT block_size_bytes = interleave_factor * UNITS_PER_WORD;
125+ HOST_WIDE_INT i, j;
126+ HOST_WIDE_INT remaining = length, words;
127+ rtx halfword_tmp = NULL, byte_tmp = NULL;
128+ rtx dst, src;
129+ bool src_aligned = MEM_ALIGN (srcbase) >= BITS_PER_WORD;
130+ bool dst_aligned = MEM_ALIGN (dstbase) >= BITS_PER_WORD;
131+ HOST_WIDE_INT srcoffset, dstoffset;
132+ HOST_WIDE_INT src_autoinc, dst_autoinc;
133+ rtx mem, addr;
134+
135+ gcc_assert (1 <= interleave_factor && interleave_factor <= 4);
136+
137+ /* Use hard registers if we have aligned source or destination so we can use
138+ load/store multiple with contiguous registers. */
139+ if (dst_aligned || src_aligned)
140+ for (i = 0; i < interleave_factor; i++)
141+ regs[i] = gen_rtx_REG (SImode, i);
142+ else
143+ for (i = 0; i < interleave_factor; i++)
144+ regs[i] = gen_reg_rtx (SImode);
145+
146+ dst = copy_addr_to_reg (XEXP (dstbase, 0));
147+ src = copy_addr_to_reg (XEXP (srcbase, 0));
148+
149+ srcoffset = dstoffset = 0;
150+
151+ /* Calls to arm_gen_load_multiple and arm_gen_store_multiple update SRC/DST.
152+ For copying the last bytes we want to subtract this offset again. */
153+ src_autoinc = dst_autoinc = 0;
154+
155+ for (i = 0; i < interleave_factor; i++)
156+ regnos[i] = i;
157+
158+ /* Copy BLOCK_SIZE_BYTES chunks. */
159+
160+ for (i = 0; i + block_size_bytes <= length; i += block_size_bytes)
161+ {
162+ /* Load words. */
163+ if (src_aligned && interleave_factor > 1)
164+ {
165+ emit_insn (arm_gen_load_multiple (regnos, interleave_factor, src,
166+ TRUE, srcbase, &srcoffset));
167+ src_autoinc += UNITS_PER_WORD * interleave_factor;
168+ }
169+ else
170+ {
171+ for (j = 0; j < interleave_factor; j++)
172+ {
173+ addr = plus_constant (src, srcoffset + j * UNITS_PER_WORD
174+ - src_autoinc);
175+ mem = adjust_automodify_address (srcbase, SImode, addr,
176+ srcoffset + j * UNITS_PER_WORD);
177+ emit_insn (gen_unaligned_loadsi (regs[j], mem));
178+ }
179+ srcoffset += block_size_bytes;
180+ }
181+
182+ /* Store words. */
183+ if (dst_aligned && interleave_factor > 1)
184+ {
185+ emit_insn (arm_gen_store_multiple (regnos, interleave_factor, dst,
186+ TRUE, dstbase, &dstoffset));
187+ dst_autoinc += UNITS_PER_WORD * interleave_factor;
188+ }
189+ else
190+ {
191+ for (j = 0; j < interleave_factor; j++)
192+ {
193+ addr = plus_constant (dst, dstoffset + j * UNITS_PER_WORD
194+ - dst_autoinc);
195+ mem = adjust_automodify_address (dstbase, SImode, addr,
196+ dstoffset + j * UNITS_PER_WORD);
197+ emit_insn (gen_unaligned_storesi (mem, regs[j]));
198+ }
199+ dstoffset += block_size_bytes;
200+ }
201+
202+ remaining -= block_size_bytes;
203+ }
204+
205+ /* Copy any whole words left (note these aren't interleaved with any
206+ subsequent halfword/byte load/stores in the interests of simplicity). */
207+
208+ words = remaining / UNITS_PER_WORD;
209+
210+ gcc_assert (words < interleave_factor);
211+
212+ if (src_aligned && words > 1)
213+ {
214+ emit_insn (arm_gen_load_multiple (regnos, words, src, TRUE, srcbase,
215+ &srcoffset));
216+ src_autoinc += UNITS_PER_WORD * words;
217+ }
218+ else
219+ {
220+ for (j = 0; j < words; j++)
221+ {
222+ addr = plus_constant (src,
223+ srcoffset + j * UNITS_PER_WORD - src_autoinc);
224+ mem = adjust_automodify_address (srcbase, SImode, addr,
225+ srcoffset + j * UNITS_PER_WORD);
226+ emit_insn (gen_unaligned_loadsi (regs[j], mem));
227+ }
228+ srcoffset += words * UNITS_PER_WORD;
229+ }
230+
231+ if (dst_aligned && words > 1)
232+ {
233+ emit_insn (arm_gen_store_multiple (regnos, words, dst, TRUE, dstbase,
234+ &dstoffset));
235+ dst_autoinc += words * UNITS_PER_WORD;
236+ }
237+ else
238+ {
239+ for (j = 0; j < words; j++)
240+ {
241+ addr = plus_constant (dst,
242+ dstoffset + j * UNITS_PER_WORD - dst_autoinc);
243+ mem = adjust_automodify_address (dstbase, SImode, addr,
244+ dstoffset + j * UNITS_PER_WORD);
245+ emit_insn (gen_unaligned_storesi (mem, regs[j]));
246+ }
247+ dstoffset += words * UNITS_PER_WORD;
248+ }
249+
250+ remaining -= words * UNITS_PER_WORD;
251+
252+ gcc_assert (remaining < 4);
253+
254+ /* Copy a halfword if necessary. */
255+
256+ if (remaining >= 2)
257+ {
258+ halfword_tmp = gen_reg_rtx (SImode);
259+
260+ addr = plus_constant (src, srcoffset - src_autoinc);
261+ mem = adjust_automodify_address (srcbase, HImode, addr, srcoffset);
262+ emit_insn (gen_unaligned_loadhiu (halfword_tmp, mem));
263+
264+ /* Either write out immediately, or delay until we've loaded the last
265+ byte, depending on interleave factor. */
266+ if (interleave_factor == 1)
267+ {
268+ addr = plus_constant (dst, dstoffset - dst_autoinc);
269+ mem = adjust_automodify_address (dstbase, HImode, addr, dstoffset);
270+ emit_insn (gen_unaligned_storehi (mem,
271+ gen_lowpart (HImode, halfword_tmp)));
272+ halfword_tmp = NULL;
273+ dstoffset += 2;
274+ }
275+
276+ remaining -= 2;
277+ srcoffset += 2;
278+ }
279+
280+ gcc_assert (remaining < 2);
281+
282+ /* Copy last byte. */
283+
284+ if ((remaining & 1) != 0)
285+ {
286+ byte_tmp = gen_reg_rtx (SImode);
287+
288+ addr = plus_constant (src, srcoffset - src_autoinc);
289+ mem = adjust_automodify_address (srcbase, QImode, addr, srcoffset);
290+ emit_move_insn (gen_lowpart (QImode, byte_tmp), mem);
291+
292+ if (interleave_factor == 1)
293+ {
294+ addr = plus_constant (dst, dstoffset - dst_autoinc);
295+ mem = adjust_automodify_address (dstbase, QImode, addr, dstoffset);
296+ emit_move_insn (mem, gen_lowpart (QImode, byte_tmp));
297+ byte_tmp = NULL;
298+ dstoffset++;
299+ }
300+
301+ remaining--;
302+ srcoffset++;
303+ }
304+
305+ /* Store last halfword if we haven't done so already. */
306+
307+ if (halfword_tmp)
308+ {
309+ addr = plus_constant (dst, dstoffset - dst_autoinc);
310+ mem = adjust_automodify_address (dstbase, HImode, addr, dstoffset);
311+ emit_insn (gen_unaligned_storehi (mem,
312+ gen_lowpart (HImode, halfword_tmp)));
313+ dstoffset += 2;
314+ }
315+
316+ /* Likewise for last byte. */
317+
318+ if (byte_tmp)
319+ {
320+ addr = plus_constant (dst, dstoffset - dst_autoinc);
321+ mem = adjust_automodify_address (dstbase, QImode, addr, dstoffset);
322+ emit_move_insn (mem, gen_lowpart (QImode, byte_tmp));
323+ dstoffset++;
324+ }
325+
326+ gcc_assert (remaining == 0 && srcoffset == dstoffset);
327+}
328+
329+/* From mips_adjust_block_mem:
330+
331+ Helper function for doing a loop-based block operation on memory
332+ reference MEM. Each iteration of the loop will operate on LENGTH
333+ bytes of MEM.
334+
335+ Create a new base register for use within the loop and point it to
336+ the start of MEM. Create a new memory reference that uses this
337+ register. Store them in *LOOP_REG and *LOOP_MEM respectively. */
338+
339+static void
340+arm_adjust_block_mem (rtx mem, HOST_WIDE_INT length, rtx *loop_reg,
341+ rtx *loop_mem)
342+{
343+ *loop_reg = copy_addr_to_reg (XEXP (mem, 0));
344+
345+ /* Although the new mem does not refer to a known location,
346+ it does keep up to LENGTH bytes of alignment. */
347+ *loop_mem = change_address (mem, BLKmode, *loop_reg);
348+ set_mem_align (*loop_mem, MIN (MEM_ALIGN (mem), length * BITS_PER_UNIT));
349+}
350+
351+/* From mips_block_move_loop:
352+
353+ Move LENGTH bytes from SRC to DEST using a loop that moves BYTES_PER_ITER
354+ bytes at a time. LENGTH must be at least BYTES_PER_ITER. Assume that
355+ the memory regions do not overlap. */
356+
357+static void
358+arm_block_move_unaligned_loop (rtx dest, rtx src, HOST_WIDE_INT length,
359+ unsigned int interleave_factor,
360+ HOST_WIDE_INT bytes_per_iter)
361+{
362+ rtx label, src_reg, dest_reg, final_src, test;
363+ HOST_WIDE_INT leftover;
364+
365+ leftover = length % bytes_per_iter;
366+ length -= leftover;
367+
368+ /* Create registers and memory references for use within the loop. */
369+ arm_adjust_block_mem (src, bytes_per_iter, &src_reg, &src);
370+ arm_adjust_block_mem (dest, bytes_per_iter, &dest_reg, &dest);
371+
372+ /* Calculate the value that SRC_REG should have after the last iteration of
373+ the loop. */
374+ final_src = expand_simple_binop (Pmode, PLUS, src_reg, GEN_INT (length),
375+ 0, 0, OPTAB_WIDEN);
376+
377+ /* Emit the start of the loop. */
378+ label = gen_label_rtx ();
379+ emit_label (label);
380+
381+ /* Emit the loop body. */
382+ arm_block_move_unaligned_straight (dest, src, bytes_per_iter,
383+ interleave_factor);
384+
385+ /* Move on to the next block. */
386+ emit_move_insn (src_reg, plus_constant (src_reg, bytes_per_iter));
387+ emit_move_insn (dest_reg, plus_constant (dest_reg, bytes_per_iter));
388+
389+ /* Emit the loop condition. */
390+ test = gen_rtx_NE (VOIDmode, src_reg, final_src);
391+ emit_jump_insn (gen_cbranchsi4 (test, src_reg, final_src, label));
392+
393+ /* Mop up any left-over bytes. */
394+ if (leftover)
395+ arm_block_move_unaligned_straight (dest, src, leftover, interleave_factor);
396+}
397+
398+/* Emit a block move when either the source or destination is unaligned (not
399+ aligned to a four-byte boundary). This may need further tuning depending on
400+ core type, optimize_size setting, etc. */
401+
402+static int
403+arm_movmemqi_unaligned (rtx *operands)
404+{
405+ HOST_WIDE_INT length = INTVAL (operands[2]);
406+
407+ if (optimize_size)
408+ {
409+ bool src_aligned = MEM_ALIGN (operands[1]) >= BITS_PER_WORD;
410+ bool dst_aligned = MEM_ALIGN (operands[0]) >= BITS_PER_WORD;
411+ /* Inlined memcpy using ldr/str/ldrh/strh can be quite big: try to limit
412+ size of code if optimizing for size. We'll use ldm/stm if src_aligned
413+ or dst_aligned though: allow more interleaving in those cases since the
414+ resulting code can be smaller. */
415+ unsigned int interleave_factor = (src_aligned || dst_aligned) ? 2 : 1;
416+ HOST_WIDE_INT bytes_per_iter = (src_aligned || dst_aligned) ? 8 : 4;
417+
418+ if (length > 12)
419+ arm_block_move_unaligned_loop (operands[0], operands[1], length,
420+ interleave_factor, bytes_per_iter);
421+ else
422+ arm_block_move_unaligned_straight (operands[0], operands[1], length,
423+ interleave_factor);
424+ }
425+ else
426+ {
427+ /* Note that the loop created by arm_block_move_unaligned_loop may be
428+ subject to loop unrolling, which makes tuning this condition a little
429+ redundant. */
430+ if (length > 32)
431+ arm_block_move_unaligned_loop (operands[0], operands[1], length, 4, 16);
432+ else
433+ arm_block_move_unaligned_straight (operands[0], operands[1], length, 4);
434+ }
435+
436+ return 1;
437+}
438+
439 int
440 arm_gen_movmemqi (rtx *operands)
441 {
442@@ -10397,8 +10748,13 @@
443
444 if (GET_CODE (operands[2]) != CONST_INT
445 || GET_CODE (operands[3]) != CONST_INT
446- || INTVAL (operands[2]) > 64
447- || INTVAL (operands[3]) & 3)
448+ || INTVAL (operands[2]) > 64)
449+ return 0;
450+
451+ if (unaligned_access && (INTVAL (operands[3]) & 3) != 0)
452+ return arm_movmemqi_unaligned (operands);
453+
454+ if (INTVAL (operands[3]) & 3)
455 return 0;
456
457 dstbase = operands[0];
458@@ -21659,6 +22015,10 @@
459 val = 6;
460 asm_fprintf (asm_out_file, "\t.eabi_attribute 30, %d\n", val);
461
462+ /* Tag_CPU_unaligned_access. */
463+ asm_fprintf (asm_out_file, "\t.eabi_attribute 34, %d\n",
464+ unaligned_access);
465+
466 /* Tag_ABI_FP_16bit_format. */
467 if (arm_fp16_format)
468 asm_fprintf (asm_out_file, "\t.eabi_attribute 38, %d\n",
469
470=== modified file 'gcc/config/arm/arm.md'
471--- gcc/config/arm/arm.md 2011-06-02 15:58:33 +0000
472+++ gcc/config/arm/arm.md 2011-06-17 10:04:51 +0000
473@@ -104,6 +104,10 @@
474 (UNSPEC_SYMBOL_OFFSET 27) ; The offset of the start of the symbol from
475 ; another symbolic address.
476 (UNSPEC_MEMORY_BARRIER 28) ; Represent a memory barrier.
477+ (UNSPEC_UNALIGNED_LOAD 29) ; Used to represent ldr/ldrh instructions that access
478+ ; unaligned locations, on architectures which support
479+ ; that.
480+ (UNSPEC_UNALIGNED_STORE 30) ; Same for str/strh.
481 ]
482 )
483
484@@ -2450,7 +2454,7 @@
485 ;;; this insv pattern, so this pattern needs to be reevalutated.
486
487 (define_expand "insv"
488- [(set (zero_extract:SI (match_operand:SI 0 "s_register_operand" "")
489+ [(set (zero_extract:SI (match_operand:SI 0 "nonimmediate_operand" "")
490 (match_operand:SI 1 "general_operand" "")
491 (match_operand:SI 2 "general_operand" ""))
492 (match_operand:SI 3 "reg_or_int_operand" ""))]
493@@ -2464,35 +2468,66 @@
494
495 if (arm_arch_thumb2)
496 {
497- bool use_bfi = TRUE;
498-
499- if (GET_CODE (operands[3]) == CONST_INT)
500- {
501- HOST_WIDE_INT val = INTVAL (operands[3]) & mask;
502-
503- if (val == 0)
504- {
505- emit_insn (gen_insv_zero (operands[0], operands[1],
506- operands[2]));
507+ if (unaligned_access && MEM_P (operands[0])
508+ && s_register_operand (operands[3], GET_MODE (operands[3]))
509+ && (width == 16 || width == 32) && (start_bit % BITS_PER_UNIT) == 0)
510+ {
511+ rtx base_addr;
512+
513+ if (width == 32)
514+ {
515+ base_addr = adjust_address (operands[0], SImode,
516+ start_bit / BITS_PER_UNIT);
517+ emit_insn (gen_unaligned_storesi (base_addr, operands[3]));
518+ }
519+ else
520+ {
521+ rtx tmp = gen_reg_rtx (HImode);
522+
523+ base_addr = adjust_address (operands[0], HImode,
524+ start_bit / BITS_PER_UNIT);
525+ emit_move_insn (tmp, gen_lowpart (HImode, operands[3]));
526+ emit_insn (gen_unaligned_storehi (base_addr, tmp));
527+ }
528+ DONE;
529+ }
530+ else if (s_register_operand (operands[0], GET_MODE (operands[0])))
531+ {
532+ bool use_bfi = TRUE;
533+
534+ if (GET_CODE (operands[3]) == CONST_INT)
535+ {
536+ HOST_WIDE_INT val = INTVAL (operands[3]) & mask;
537+
538+ if (val == 0)
539+ {
540+ emit_insn (gen_insv_zero (operands[0], operands[1],
541+ operands[2]));
542+ DONE;
543+ }
544+
545+ /* See if the set can be done with a single orr instruction. */
546+ if (val == mask && const_ok_for_arm (val << start_bit))
547+ use_bfi = FALSE;
548+ }
549+
550+ if (use_bfi)
551+ {
552+ if (GET_CODE (operands[3]) != REG)
553+ operands[3] = force_reg (SImode, operands[3]);
554+
555+ emit_insn (gen_insv_t2 (operands[0], operands[1], operands[2],
556+ operands[3]));
557 DONE;
558 }
559-
560- /* See if the set can be done with a single orr instruction. */
561- if (val == mask && const_ok_for_arm (val << start_bit))
562- use_bfi = FALSE;
563- }
564-
565- if (use_bfi)
566- {
567- if (GET_CODE (operands[3]) != REG)
568- operands[3] = force_reg (SImode, operands[3]);
569-
570- emit_insn (gen_insv_t2 (operands[0], operands[1], operands[2],
571- operands[3]));
572- DONE;
573- }
574+ }
575+ else
576+ FAIL;
577 }
578
579+ if (!s_register_operand (operands[0], GET_MODE (operands[0])))
580+ FAIL;
581+
582 target = copy_rtx (operands[0]);
583 /* Avoid using a subreg as a subtarget, and avoid writing a paradoxical
584 subreg as the final target. */
585@@ -3685,7 +3720,7 @@
586
587 (define_expand "extzv"
588 [(set (match_dup 4)
589- (ashift:SI (match_operand:SI 1 "register_operand" "")
590+ (ashift:SI (match_operand:SI 1 "nonimmediate_operand" "")
591 (match_operand:SI 2 "const_int_operand" "")))
592 (set (match_operand:SI 0 "register_operand" "")
593 (lshiftrt:SI (match_dup 4)
594@@ -3698,10 +3733,53 @@
595
596 if (arm_arch_thumb2)
597 {
598- emit_insn (gen_extzv_t2 (operands[0], operands[1], operands[2],
599- operands[3]));
600- DONE;
601+ HOST_WIDE_INT width = INTVAL (operands[2]);
602+ HOST_WIDE_INT bitpos = INTVAL (operands[3]);
603+
604+ if (unaligned_access && MEM_P (operands[1])
605+ && (width == 16 || width == 32) && (bitpos % BITS_PER_UNIT) == 0)
606+ {
607+ rtx base_addr;
608+
609+ if (width == 32)
610+ {
611+ base_addr = adjust_address (operands[1], SImode,
612+ bitpos / BITS_PER_UNIT);
613+ emit_insn (gen_unaligned_loadsi (operands[0], base_addr));
614+ }
615+ else
616+ {
617+ rtx dest = operands[0];
618+ rtx tmp = gen_reg_rtx (SImode);
619+
620+ /* We may get a paradoxical subreg here. Strip it off. */
621+ if (GET_CODE (dest) == SUBREG
622+ && GET_MODE (dest) == SImode
623+ && GET_MODE (SUBREG_REG (dest)) == HImode)
624+ dest = SUBREG_REG (dest);
625+
626+ if (GET_MODE_BITSIZE (GET_MODE (dest)) != width)
627+ FAIL;
628+
629+ base_addr = adjust_address (operands[1], HImode,
630+ bitpos / BITS_PER_UNIT);
631+ emit_insn (gen_unaligned_loadhiu (tmp, base_addr));
632+ emit_move_insn (gen_lowpart (SImode, dest), tmp);
633+ }
634+ DONE;
635+ }
636+ else if (s_register_operand (operands[1], GET_MODE (operands[1])))
637+ {
638+ emit_insn (gen_extzv_t2 (operands[0], operands[1], operands[2],
639+ operands[3]));
640+ DONE;
641+ }
642+ else
643+ FAIL;
644 }
645+
646+ if (!s_register_operand (operands[1], GET_MODE (operands[1])))
647+ FAIL;
648
649 operands[3] = GEN_INT (rshift);
650
651@@ -3716,7 +3794,113 @@
652 }"
653 )
654
655-(define_insn "extv"
656+(define_expand "extv"
657+ [(set (match_operand:SI 0 "s_register_operand" "")
658+ (sign_extract:SI (match_operand:SI 1 "nonimmediate_operand" "")
659+ (match_operand:SI 2 "const_int_operand" "")
660+ (match_operand:SI 3 "const_int_operand" "")))]
661+ "arm_arch_thumb2"
662+{
663+ HOST_WIDE_INT width = INTVAL (operands[2]);
664+ HOST_WIDE_INT bitpos = INTVAL (operands[3]);
665+
666+ if (unaligned_access && MEM_P (operands[1]) && (width == 16 || width == 32)
667+ && (bitpos % BITS_PER_UNIT) == 0)
668+ {
669+ rtx base_addr;
670+
671+ if (width == 32)
672+ {
673+ base_addr = adjust_address (operands[1], SImode,
674+ bitpos / BITS_PER_UNIT);
675+ emit_insn (gen_unaligned_loadsi (operands[0], base_addr));
676+ }
677+ else
678+ {
679+ rtx dest = operands[0];
680+ rtx tmp = gen_reg_rtx (SImode);
681+
682+ /* We may get a paradoxical subreg here. Strip it off. */
683+ if (GET_CODE (dest) == SUBREG
684+ && GET_MODE (dest) == SImode
685+ && GET_MODE (SUBREG_REG (dest)) == HImode)
686+ dest = SUBREG_REG (dest);
687+
688+ if (GET_MODE_BITSIZE (GET_MODE (dest)) != width)
689+ FAIL;
690+
691+ base_addr = adjust_address (operands[1], HImode,
692+ bitpos / BITS_PER_UNIT);
693+ emit_insn (gen_unaligned_loadhis (tmp, base_addr));
694+ emit_move_insn (gen_lowpart (SImode, dest), tmp);
695+ }
696+
697+ DONE;
698+ }
699+ else if (!s_register_operand (operands[1], GET_MODE (operands[1])))
700+ FAIL;
701+})
702+
703+; ARMv6+ unaligned load/store instructions (used for packed structure accesses).
704+
705+(define_insn "unaligned_loadsi"
706+ [(set (match_operand:SI 0 "s_register_operand" "=l,r")
707+ (unspec:SI [(match_operand:SI 1 "memory_operand" "Uw,m")]
708+ UNSPEC_UNALIGNED_LOAD))]
709+ "unaligned_access && TARGET_32BIT"
710+ "ldr%?\t%0, %1\t@ unaligned"
711+ [(set_attr "arch" "t2,any")
712+ (set_attr "length" "2,4")
713+ (set_attr "predicable" "yes")
714+ (set_attr "type" "load1")])
715+
716+(define_insn "unaligned_loadhis"
717+ [(set (match_operand:SI 0 "s_register_operand" "=l,r")
718+ (sign_extend:SI
719+ (unspec:HI [(match_operand:HI 1 "memory_operand" "Uw,m")]
720+ UNSPEC_UNALIGNED_LOAD)))]
721+ "unaligned_access && TARGET_32BIT"
722+ "ldr%(sh%)\t%0, %1\t@ unaligned"
723+ [(set_attr "arch" "t2,any")
724+ (set_attr "length" "2,4")
725+ (set_attr "predicable" "yes")
726+ (set_attr "type" "load_byte")])
727+
728+(define_insn "unaligned_loadhiu"
729+ [(set (match_operand:SI 0 "s_register_operand" "=l,r")
730+ (zero_extend:SI
731+ (unspec:HI [(match_operand:HI 1 "memory_operand" "Uw,m")]
732+ UNSPEC_UNALIGNED_LOAD)))]
733+ "unaligned_access && TARGET_32BIT"
734+ "ldr%(h%)\t%0, %1\t@ unaligned"
735+ [(set_attr "arch" "t2,any")
736+ (set_attr "length" "2,4")
737+ (set_attr "predicable" "yes")
738+ (set_attr "type" "load_byte")])
739+
740+(define_insn "unaligned_storesi"
741+ [(set (match_operand:SI 0 "memory_operand" "=Uw,m")
742+ (unspec:SI [(match_operand:SI 1 "s_register_operand" "l,r")]
743+ UNSPEC_UNALIGNED_STORE))]
744+ "unaligned_access && TARGET_32BIT"
745+ "str%?\t%1, %0\t@ unaligned"
746+ [(set_attr "arch" "t2,any")
747+ (set_attr "length" "2,4")
748+ (set_attr "predicable" "yes")
749+ (set_attr "type" "store1")])
750+
751+(define_insn "unaligned_storehi"
752+ [(set (match_operand:HI 0 "memory_operand" "=Uw,m")
753+ (unspec:HI [(match_operand:HI 1 "s_register_operand" "l,r")]
754+ UNSPEC_UNALIGNED_STORE))]
755+ "unaligned_access && TARGET_32BIT"
756+ "str%(h%)\t%1, %0\t@ unaligned"
757+ [(set_attr "arch" "t2,any")
758+ (set_attr "length" "2,4")
759+ (set_attr "predicable" "yes")
760+ (set_attr "type" "store1")])
761+
762+(define_insn "*extv_reg"
763 [(set (match_operand:SI 0 "s_register_operand" "=r")
764 (sign_extract:SI (match_operand:SI 1 "s_register_operand" "r")
765 (match_operand:SI 2 "const_int_operand" "M")
766@@ -6003,8 +6187,8 @@
767
768
769 (define_insn "*arm_movqi_insn"
770- [(set (match_operand:QI 0 "nonimmediate_operand" "=r,r,r,m")
771- (match_operand:QI 1 "general_operand" "rI,K,m,r"))]
772+ [(set (match_operand:QI 0 "nonimmediate_operand" "=r,r,l,Uu,r,m")
773+ (match_operand:QI 1 "general_operand" "rI,K,Uu,l,m,r"))]
774 "TARGET_32BIT
775 && ( register_operand (operands[0], QImode)
776 || register_operand (operands[1], QImode))"
777@@ -6012,10 +6196,14 @@
778 mov%?\\t%0, %1
779 mvn%?\\t%0, #%B1
780 ldr%(b%)\\t%0, %1
781+ str%(b%)\\t%1, %0
782+ ldr%(b%)\\t%0, %1
783 str%(b%)\\t%1, %0"
784- [(set_attr "type" "*,*,load1,store1")
785- (set_attr "insn" "mov,mvn,*,*")
786- (set_attr "predicable" "yes")]
787+ [(set_attr "type" "*,*,load1,store1,load1,store1")
788+ (set_attr "insn" "mov,mvn,*,*,*,*")
789+ (set_attr "predicable" "yes")
790+ (set_attr "arch" "any,any,t2,t2,any,any")
791+ (set_attr "length" "4,4,2,2,4,4")]
792 )
793
794 (define_insn "*thumb1_movqi_insn"
795
796=== modified file 'gcc/config/arm/arm.opt'
797--- gcc/config/arm/arm.opt 2009-06-18 11:24:10 +0000
798+++ gcc/config/arm/arm.opt 2011-06-17 10:04:51 +0000
799@@ -169,3 +169,7 @@
800 Target Report Var(fix_cm3_ldrd) Init(2)
801 Avoid overlapping destination and address registers on LDRD instructions
802 that may trigger Cortex-M3 errata.
803+
804+munaligned-access
805+Target Report Var(unaligned_access) Init(2)
806+Enable unaligned word and halfword accesses to packed data.
807
808=== modified file 'gcc/config/arm/constraints.md'
809--- gcc/config/arm/constraints.md 2011-01-03 20:52:22 +0000
810+++ gcc/config/arm/constraints.md 2011-06-17 10:04:51 +0000
811@@ -36,6 +36,7 @@
812 ;; The following memory constraints have been used:
813 ;; in ARM/Thumb-2 state: Q, Ut, Uv, Uy, Un, Um, Us
814 ;; in ARM state: Uq
815+;; in Thumb state: Uu, Uw
816
817
818 (define_register_constraint "f" "TARGET_ARM ? FPA_REGS : NO_REGS"
819@@ -327,6 +328,27 @@
820 (and (match_code "mem")
821 (match_test "REG_P (XEXP (op, 0))")))
822
823+(define_memory_constraint "Uu"
824+ "@internal
825+ In Thumb state an address that is valid in 16bit encoding."
826+ (and (match_code "mem")
827+ (match_test "TARGET_THUMB
828+ && thumb1_legitimate_address_p (GET_MODE (op), XEXP (op, 0),
829+ 0)")))
830+
831+; The 16-bit post-increment LDR/STR accepted by thumb1_legitimate_address_p
832+; are actually LDM/STM instructions, so cannot be used to access unaligned
833+; data.
834+(define_memory_constraint "Uw"
835+ "@internal
836+ In Thumb state an address that is valid in 16bit encoding, and that can be
837+ used for unaligned accesses."
838+ (and (match_code "mem")
839+ (match_test "TARGET_THUMB
840+ && thumb1_legitimate_address_p (GET_MODE (op), XEXP (op, 0),
841+ 0)
842+ && GET_CODE (XEXP (op, 0)) != POST_INC")))
843+
844 ;; We used to have constraint letters for S and R in ARM state, but
845 ;; all uses of these now appear to have been removed.
846
847
848=== modified file 'gcc/expmed.c'
849--- gcc/expmed.c 2011-05-22 19:02:59 +0000
850+++ gcc/expmed.c 2011-06-17 10:04:51 +0000
851@@ -703,7 +703,7 @@
852 /* On big-endian machines, we count bits from the most significant.
853 If the bit field insn does not, we must invert. */
854
855- if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN)
856+ if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN && !MEM_P (xop0))
857 xbitpos = unit - bitsize - xbitpos;
858
859 /* We have been counting XBITPOS within UNIT.
860@@ -1554,7 +1554,7 @@
861
862 /* On big-endian machines, we count bits from the most significant.
863 If the bit field insn does not, we must invert. */
864- if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN)
865+ if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN && !MEM_P (xop0))
866 xbitpos = unit - bitsize - xbitpos;
867
868 /* Now convert from counting within UNIT to counting in EXT_MODE. */

Subscribers

People subscribed via source and target branches