Merge lp:~ams-codesourcery/gcc-linaro/cs-merge-20110413 into lp:gcc-linaro/4.5
- cs-merge-20110413
- Merge into 4.5
Status: | Rejected |
---|---|
Rejected by: | Loïc Minier |
Proposed branch: | lp:~ams-codesourcery/gcc-linaro/cs-merge-20110413 |
Merge into: | lp:gcc-linaro/4.5 |
Diff against target: |
1208 lines (+841/-83) 16 files modified
ChangeLog.linaro (+125/-0) gcc/combine.c (+15/-1) gcc/config/arm/arm.c (+356/-5) gcc/config/arm/arm.md (+203/-31) gcc/config/arm/arm.opt (+4/-0) gcc/config/arm/unwind-arm.c (+30/-10) gcc/expmed.c (+2/-2) gcc/final.c (+6/-0) gcc/ifcvt.c (+4/-0) gcc/ipa-pure-const.c (+8/-1) gcc/testsuite/gcc.c-torture/compile/20110322-1.c (+22/-0) gcc/testsuite/gcc.dg/pr47763.c (+9/-0) gcc/testsuite/lib/target-supports.exp (+3/-2) gcc/tree-ssa-copyrename.c (+12/-0) gcc/web.c (+11/-1) libstdc++-v3/libsupc++/eh_arm.cc (+31/-30) |
To merge this branch: | bzr merge lp:~ams-codesourcery/gcc-linaro/cs-merge-20110413 |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Andrew Stubbs (community) | Needs Resubmitting | ||
Review via email: mp+57486@code.launchpad.net |
Commit message
Description of the change
Latest batch of merges from Sourcery G++.
Ira Rosen (irar) wrote : | # |
Linaro Toolchain Builder (cbuild) wrote : | # |
cbuild has taken a snapshot of this branch at r99502 and queued it for build.
The snapshot is available at:
http://
and will be built on the following builders:
a8-builder a9-builder i686 x86_64
You can track the build queue at:
http://
cbuild-snapshot: gcc-linaro-
cbuild-ancestor: lp:gcc-linaro+bzr99492
cbuild-state: check
Ramana Radhakrishnan (ramana) wrote : | # |
Since this is a large feature branch review is a bit harder in this case.
99493 - OK
99494 - OK
99495 - OK - has gone upstream .
99496 - OK - could go upstream into FSF 4.5 branch if possible.
99497 - OK
99498 - OK
99499 - OK
OK if no regressions.
99500 - OK - Looks sensible but would like to do a round of benchmarking but can go in for sure.
99501 - Not fully reviewed. A first cut review - I *think* this is a sensible approach given that we can't just blanket remove SLOW_UNALIGNED_
99502 - Ambivalent - Not fully reviewed . A comment is we probably want some testcases for this and it looks overall like a nice improvement.Should we take this at the start of the next release, so that it gets baked for 3 weeks more than just a week before the release ?
cheers
Ramana
Ramana Radhakrishnan (ramana) wrote : | # |
As long as we can get bzr to individually commit these so that any performance archaelogy has a chance to individually benchmark each of these patches I think these are ok.
99493 - OK
99494 - OK
99495 - OK - has gone upstream .
99496 - OK - could go upstream into FSF 4.5 branch if possible.
99497 - OK
99498 - OK
Can we commit 99500, 99501 , 99502 as individual patches and do the merge just after the April release ?
Could we link 99499 only to the bug fix rather than one commit into gcc-linaro/4.5 which is the bug fix ?
Sorry I just think we should then end up having a merge request per feature or a merge request for a bug fix rather than one gigantic merge. I'm still not a power user of bzr so not totally sure if we can do all that's needed here.
Ramana
Linaro Toolchain Builder (cbuild) wrote : | # |
cbuild successfully built this on i686-lucid-
The build results are available at:
http://
The test suite results changed compared to the branch point lp:gcc-linaro+bzr99492:
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
-PASS: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
+UNSUPPORTED: gcc.dg/
+UNSUPPORTED: gcc.dg/
+UNSUPPORTED: gcc.dg/
+UNSUPPORTED: gcc.dg/
+UNSUPPORTED: gcc.dg/
+UNSUPPORTED: gcc.dg/
+UNSUPPORTED: gcc.dg/
+UNSUPPORTED: gcc.dg/
+UNSUPPORTED: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
+UNSUPPORTED: gcc.dg/
+UNSUPPORTED: gcc.dg/
+...
Linaro Toolchain Builder (cbuild) wrote : | # |
cbuild successfully built this on x86_64-
The build results are available at:
http://
The test suite results changed compared to the branch point lp:gcc-linaro+bzr99492:
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
-PASS: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
+UNSUPPORTED: gcc.dg/
+UNSUPPORTED: gcc.dg/
+UNSUPPORTED: gcc.dg/
+UNSUPPORTED: gcc.dg/
+UNSUPPORTED: gcc.dg/
+UNSUPPORTED: gcc.dg/
+UNSUPPORTED: gcc.dg/
+UNSUPPORTED: gcc.dg/
+UNSUPPORTED: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
+UNSUPPORTED: gcc.dg/
+UNSUPPORTED: gcc.dg/
Michael Hope (michaelh1) wrote : | # |
I'm happy with merging all of the correctness fixes. Please leave the new features such as unaligned memcpy() out until after this release. I want to have time to try them with Python, SPEC, and EEMBC and have time to correct any problems.
Andrew Stubbs (ams-codesourcery) wrote : | # |
> Sorry I just think we should then end up having a merge request per feature or
> a merge request for a bug fix rather than one gigantic merge. I'm still not a
> power user of bzr so not totally sure if we can do all that's needed here.
In the past we've treated CS patches as pre-reviewed (which they are), and committed them directly to lp:gcc-linaro. I've only really changed this as a means to get the extra testing before I check in (I'm going to have to check the gcc.dg/
That said, extra review is always valuable. :)
I deliberately chose to submit them as a batch, because if I had submitted them individually I doubt we'd have had all the test results in time, and anyway some depend on others, at least textually. I did/do intend to do a rebase and push, rather than a merge, so as not to flatten the history. I can leave out some revisions if they're broken, or reviewers don't like them.
Loïc Minier (lool) wrote : | # |
In any case, the history wont be flattened if you do a merge; it will be shown as a merge in the bzr history and all the individual original revisions will still be accessible; see bzr log -n0 or --include-merges to see them (or use bzr vis).
Linaro Toolchain Builder (cbuild) wrote : | # |
cbuild successfully built this on armv7l-
The build results are available at:
http://
The test suite results changed compared to the branch point lp:gcc-linaro+bzr99492:
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
-PASS: gcc.dg/
-PASS: gcc.dg/
-FAIL: gcc.dg/
-FAIL: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
+UNSUPPORTED: gcc.dg/
+UNSUPPORTED: gcc.dg/
+UNSUPPORTED: gcc.dg/
+UNSUPPORTED: gcc.dg/
+UNSUPPORTED: gcc.dg/
+UNSUPPORTED: gcc.dg/
+UNSUPPORTED: gcc.dg/
+UNSUPPORTED: gcc.dg/
+UNSUPPORTED: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
-FAIL: gcc.dg/
-FAIL: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
+UNSUPPORTED: gcc.dg/
+UNSUPPORTED: gcc.dg/
Linaro Toolchain Builder (cbuild) wrote : | # |
cbuild successfully built this on armv7l-
The build results are available at:
http://
The test suite results changed compared to the branch point lp:gcc-linaro+bzr99492:
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
-FAIL: gcc.c-torture/
+PASS: gcc.c-torture/
-UNRESOLVED: gcc.dg/
+PASS: gcc.dg/
-FAIL: gcc.dg/
+PASS: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
-FAIL: gcc.dg/
-FAIL: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
+UNSUPPORTED: gcc.dg/
+UNSUPPORTED: gcc.dg/
+UNSUPPORTED: gcc.dg/
+UNSUPPORTED: gcc.dg/
+UNSUPPORTED: gcc.dg/
+UNSUPPORTED: gcc.dg/
+UNSUPPORTED: gcc.dg/
+UNSUPPORTED: gcc.dg/
+UNSUPPORTED: gcc.dg/
-PASS: gcc.dg/
-PASS: gcc.dg/
-FAIL: gcc.dg/
-FAIL: gcc.dg/
-PASS: gcc.dg/
-...
Michael Hope (michaelh1) wrote : | # |
I'd prefer to keep the new features for the next release. It's not far away and it'll give us some more breathing room.
Michael Hope (michaelh1) wrote : | # |
Andrew Stubbs (ams-codesourcery) wrote : | # |
This is now merged as far as 99498.
I'll resubmit the rest later.
Michael Hope (michaelh1) wrote : | # |
For reference, I've gone through the patches and seen what's where. See:
https:/
or in text:
======== ========= ======== ========== ========== =======
Revision Bug In trunk In our 4.5 In our 4.6 Notes
======== ========= ======== ========== ========== =======
99493 PR45052 162528 Y (twice) Y
99494 175641 99498 N ABI conformance
99495 PR47763 170422 N N Dhrystone improvement
99496 PR47427 169226 N Y ICE
99497 171304 B N Bug
99498 N N N Bad debug info
99499 LP:736439 171840 N N Shrinkwrap
99500 N N N Needs benchmarking
99501 Unaligned; obsolete
99502 Unaligned; obsolete
======== ========= ======== ========== ========== =======
Based on this we should backport 99493, 99494, 99495, 99496, 99497, 99499, and 99500.
Michael Hope (michaelh1) wrote : | # |
(well, except 99493 and 99496 which are already in our 4.6...)
Michael Hope (michaelh1) wrote : | # |
99495 was picked up via the 4.6 release branch.
Michael Hope (michaelh1) wrote : | # |
I've created merge requests for 99494, 99497, 99499 (4.6 + 4.5), and 99500.
Unmerged revisions
- 99502. By Andrew Stubbs
-
Add unaligned support for built-in memcpy.
Merged from Sourcery G++
- 99501. By Andrew Stubbs
-
Add support for unaligned loads/stores.
Merged from Sourcery G++
- 99500. By Andrew Stubbs
-
Fix performance regression when using NEON.
Merged from Sourcery G++
- 99499. By Andrew Stubbs
-
Fix shrink wrapping bug.
Merged from Sourcery G++
- 99498. By Andrew Stubbs
-
Fix a bug that generated bad debug info.
Merged from Sourcery G++
- 99497. By Andrew Stubbs
-
Fix bug in__builtin_
isgreaterequal. Merged from Sourcery G++
(Backport from FSF)
- 99496. By Andrew Stubbs
-
Fix internal compiler errors.
GCC Bugzilla #47427 & #47428.
Merged from Sourcery G++
(Backport from FSF)
- 99495. By Andrew Stubbs
-
Fix a performance regression in Dhrystone.
Merged from Sourcery SG++.
- 99494. By Andrew Stubbs
-
Fix an ABI conformance issue that affected armcc interoperability.
Merged from Sourcery G++.
- 99493. By Andrew Stubbs
-
Fix a bug in which "volatile" keyword could be ignored.
Merged from Sourcery G++.
(Backport from FSF)
Preview Diff
1 | === modified file 'ChangeLog.linaro' |
2 | --- ChangeLog.linaro 2011-04-11 09:52:39 +0000 |
3 | +++ ChangeLog.linaro 2011-04-13 13:28:18 +0000 |
4 | @@ -1,3 +1,128 @@ |
5 | +2011-04-01 Julian Brown <julian@codesourcery.com> |
6 | + |
7 | + Issue #4220 |
8 | + |
9 | + gcc/ |
10 | + * config/arm/arm.c (arm_block_move_unaligned_straight) |
11 | + (arm_adjust_block_mem, arm_block_move_unaligned_loop) |
12 | + (arm_movmemqi_unaligned): New. |
13 | + (arm_gen_movmemqi): Support unaligned block copies. |
14 | + |
15 | +2011-04-01 Julian Brown <julian@codesourcery.com> |
16 | + |
17 | + Issue #4220 |
18 | + |
19 | + gcc/ |
20 | + * config/arm/arm.c (arm_override_options): Add unaligned_access |
21 | + support. |
22 | + * config/arm/arm.md (UNSPEC_UNALIGNED_LOAD) |
23 | + (UNSPEC_UNALIGNED_STORE): Add constants for unspecs. |
24 | + (insv, extzv): Add unaligned-access support. |
25 | + (extv): Change to expander. Likewise. |
26 | + (unaligned_loadsi, unaligned_loadhis, unaligned_loadhiu) |
27 | + (unaligned_storesi, unaligned_storehi): New. |
28 | + (*extv_reg): New (previous extv implementation). |
29 | + * config/arm/arm.opt (munaligned_access): Add option. |
30 | + * expmed.c (store_bit_field_1): Don't tweak bitfield numbering for |
31 | + memory locations if BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN. |
32 | + (extract_bit_field_1): Likewise. |
33 | + |
34 | +2011-04-01 Julian Brown <julian@codesourcery.com> |
35 | + |
36 | + Issue #6223 |
37 | + |
38 | + gcc/ |
39 | + * config/arm/arm.c (arm_vector_alignment_reachable): Don't use |
40 | + peeling for alignment for NEON. |
41 | + |
42 | + gcc/testsuite/ |
43 | + * lib/target-supports.exp |
44 | + (check_effective_target_vector_alignment_reachable): Not true for |
45 | + ARM NEON. |
46 | + |
47 | +2011-03-28 Bernd Schmidt <bernds@codesourcery.com> |
48 | + |
49 | + gcc/ |
50 | + * ifcvt.c (cond_exec_process_insns): Don't convert the function |
51 | + prologue. |
52 | + |
53 | + gcc/testsuite/ |
54 | + * gcc.c-torture/compile/20110322-1.c: New test. |
55 | + |
56 | +2011-03-28 Paul Brook <paul@codesourcery.com> |
57 | + Daniel Jacobowitz <dan@codesourcery.com> |
58 | + |
59 | + Issue #3772 |
60 | + gcc/ |
61 | + * final.c (notice_source_line): Ignore zero length instructions. |
62 | + |
63 | +2011-03-22 Sandra Loosemore <sandra@codesourcery.com> |
64 | + |
65 | + Issue #10483 |
66 | + |
67 | + gcc/ |
68 | + Backport from mainline: |
69 | + |
70 | + 2011-03-22 Eric Botcazou <ebotcazou@adacore.com> |
71 | + |
72 | + * combine.c (simplify_set): Try harder to find the best CC mode when |
73 | + simplifying a nested COMPARE on the RHS. |
74 | + |
75 | +2011-03-17 Janis Johnson <janisjo@codesourcery.com> |
76 | + |
77 | + Backport from FSF mainline: |
78 | + 2011-01-25 Jakub Jelinek <jakub@redhat.com> |
79 | + |
80 | + PR tree-optimization/47427 |
81 | + PR tree-optimization/47428 |
82 | + * tree-ssa-copyrename.c (copy_rename_partition_coalesce): Don't |
83 | + coalesce if the new root var would be TREE_READONLY. |
84 | + |
85 | +2011-02-23 Jie Zhang <jie@codesourcery.com> |
86 | + |
87 | + Issue #10134 |
88 | + |
89 | + Backport from mainline |
90 | + |
91 | + gcc/ |
92 | + 2011-02-23 Jie Zhang <jie@codesourcery.com> |
93 | + |
94 | + PR rtl-optimization/47763 |
95 | + * web.c (web_main): Ignore naked clobber when replacing register. |
96 | + |
97 | + gcc/testsuite/ |
98 | + 2011-02-23 Jie Zhang <jie@codesourcery.com> |
99 | + |
100 | + PR rtl-optimization/47763 |
101 | + * gcc.dg/pr47763.c: New test. |
102 | + |
103 | +2011-02-16 Nathan Sidwell <nathan@codesourcery.com> |
104 | + |
105 | + Issue #10439 |
106 | + gcc/ |
107 | + * config/arm/unwind-arm.c (enum __cxa_type_match_result): New. |
108 | + (cxa_type_match): Correct declaration. |
109 | + (__gnu_unwind_pr_common): Reconstruct |
110 | + additional indirection when __cxa_type_match returns |
111 | + succeeded_with_ptr_to_base. |
112 | + |
113 | + libstdc++/ |
114 | + * libsupc++/eh_arm.c (__cxa_type_match): Construct address of |
115 | + thrown object here. Return succeded_with_ptr_to_base for all |
116 | + pointer cases. |
117 | + |
118 | +2011-02-14 Kwok Cheung Yeung <kcy@codesourcery.com> |
119 | + |
120 | + Issue #10417 |
121 | + |
122 | + Backport from mainline |
123 | + |
124 | + gcc/ |
125 | + 2010-07-31 Richard Guenther <rguenther@suse.de> |
126 | + |
127 | + PR tree-optimization/45052 |
128 | + * ipa-pure-const.c (check_stmt): Check volatileness. |
129 | + |
130 | 2011-04-10 Michael Hope <michael.hope@linaro.org> |
131 | |
132 | gcc/ |
133 | |
134 | === modified file 'gcc/combine.c' |
135 | --- gcc/combine.c 2011-01-06 11:02:44 +0000 |
136 | +++ gcc/combine.c 2011-04-13 13:28:18 +0000 |
137 | @@ -5966,10 +5966,18 @@ |
138 | enum rtx_code new_code; |
139 | rtx op0, op1, tmp; |
140 | int other_changed = 0; |
141 | + rtx inner_compare = NULL_RTX; |
142 | enum machine_mode compare_mode = GET_MODE (dest); |
143 | |
144 | if (GET_CODE (src) == COMPARE) |
145 | - op0 = XEXP (src, 0), op1 = XEXP (src, 1); |
146 | + { |
147 | + op0 = XEXP (src, 0), op1 = XEXP (src, 1); |
148 | + if (GET_CODE (op0) == COMPARE && op1 == const0_rtx) |
149 | + { |
150 | + inner_compare = op0; |
151 | + op0 = XEXP (inner_compare, 0), op1 = XEXP (inner_compare, 1); |
152 | + } |
153 | + } |
154 | else |
155 | op0 = src, op1 = CONST0_RTX (GET_MODE (src)); |
156 | |
157 | @@ -6011,6 +6019,12 @@ |
158 | need to use a different CC mode here. */ |
159 | if (GET_MODE_CLASS (GET_MODE (op0)) == MODE_CC) |
160 | compare_mode = GET_MODE (op0); |
161 | + else if (inner_compare |
162 | + && GET_MODE_CLASS (GET_MODE (inner_compare)) == MODE_CC |
163 | + && new_code == old_code |
164 | + && op0 == XEXP (inner_compare, 0) |
165 | + && op1 == XEXP (inner_compare, 1)) |
166 | + compare_mode = GET_MODE (inner_compare); |
167 | else |
168 | compare_mode = SELECT_CC_MODE (new_code, op0, op1); |
169 | |
170 | |
171 | === modified file 'gcc/config/arm/arm.c' |
172 | --- gcc/config/arm/arm.c 2011-03-02 11:29:06 +0000 |
173 | +++ gcc/config/arm/arm.c 2011-04-13 13:28:18 +0000 |
174 | @@ -1896,6 +1896,22 @@ |
175 | if (arm_selected_tune->core == cortexm4) |
176 | flag_schedule_interblock = 0; |
177 | |
178 | + /* Enable -munaligned-access by default for |
179 | + - all ARMv6 architecture-based processors |
180 | + - ARMv7-A, ARMv7-R, and ARMv7-M architecture-based processors. |
181 | + |
182 | + Disable -munaligned-access by default for |
183 | + - all pre-ARMv6 architecture-based processors |
184 | + - ARMv6-M architecture-based processors. */ |
185 | + |
186 | + if (unaligned_access == 2) |
187 | + { |
188 | + if (arm_arch6 && (arm_arch_notm || arm_arch7)) |
189 | + unaligned_access = 1; |
190 | + else |
191 | + unaligned_access = 0; |
192 | + } |
193 | + |
194 | if (TARGET_THUMB1 && flag_schedule_insns) |
195 | { |
196 | /* Don't warn since it's on by default in -O2. */ |
197 | @@ -10505,6 +10521,333 @@ |
198 | return true; |
199 | } |
200 | |
201 | +/* Copy a block of memory using plain ldr/str/ldrh/strh instructions, to permit |
202 | + unaligned copies on processors which support unaligned semantics for those |
203 | + instructions. INTERLEAVE_FACTOR can be used to attempt to hide load latency |
204 | + (using more registers) by doing e.g. load/load/store/store for a factor of 2. |
205 | + An interleave factor of 1 (the minimum) will perform no interleaving. |
206 | + Load/store multiple are used for aligned addresses where possible. */ |
207 | + |
208 | +static void |
209 | +arm_block_move_unaligned_straight (rtx dstbase, rtx srcbase, |
210 | + HOST_WIDE_INT length, |
211 | + unsigned int interleave_factor) |
212 | +{ |
213 | + rtx *regs = XALLOCAVEC (rtx, interleave_factor); |
214 | + HOST_WIDE_INT block_size_bytes = interleave_factor * UNITS_PER_WORD; |
215 | + HOST_WIDE_INT i, j; |
216 | + HOST_WIDE_INT remaining = length, words; |
217 | + rtx halfword_tmp = NULL, byte_tmp = NULL; |
218 | + rtx dst, src; |
219 | + bool src_aligned = MEM_ALIGN (srcbase) >= BITS_PER_WORD; |
220 | + bool dst_aligned = MEM_ALIGN (dstbase) >= BITS_PER_WORD; |
221 | + HOST_WIDE_INT srcoffset, dstoffset; |
222 | + HOST_WIDE_INT src_autoinc, dst_autoinc; |
223 | + rtx mem, addr; |
224 | + |
225 | + gcc_assert (1 <= interleave_factor && interleave_factor <= 4); |
226 | + |
227 | + /* Use hard registers if we have aligned source or destination so we can use |
228 | + load/store multiple with contiguous registers. */ |
229 | + if (dst_aligned || src_aligned) |
230 | + for (i = 0; i < interleave_factor; i++) |
231 | + regs[i] = gen_rtx_REG (SImode, i); |
232 | + else |
233 | + for (i = 0; i < interleave_factor; i++) |
234 | + regs[i] = gen_reg_rtx (SImode); |
235 | + |
236 | + dst = copy_addr_to_reg (XEXP (dstbase, 0)); |
237 | + src = copy_addr_to_reg (XEXP (srcbase, 0)); |
238 | + |
239 | + srcoffset = dstoffset = 0; |
240 | + |
241 | + /* Calls to arm_gen_load_multiple and arm_gen_store_multiple update SRC/DST. |
242 | + For copying the last bytes we want to subtract this offset again. */ |
243 | + src_autoinc = dst_autoinc = 0; |
244 | + |
245 | + /* Copy BLOCK_SIZE_BYTES chunks. */ |
246 | + |
247 | + for (i = 0; i + block_size_bytes <= length; i += block_size_bytes) |
248 | + { |
249 | + /* Load words. */ |
250 | + if (src_aligned && interleave_factor > 1) |
251 | + { |
252 | + emit_insn (arm_gen_load_multiple (arm_regs_in_sequence, |
253 | + interleave_factor, src, TRUE, |
254 | + srcbase, &srcoffset)); |
255 | + src_autoinc += UNITS_PER_WORD * interleave_factor; |
256 | + } |
257 | + else |
258 | + { |
259 | + for (j = 0; j < interleave_factor; j++) |
260 | + { |
261 | + addr = plus_constant (src, srcoffset + j * UNITS_PER_WORD |
262 | + - src_autoinc); |
263 | + mem = adjust_automodify_address (srcbase, SImode, addr, |
264 | + srcoffset + j * UNITS_PER_WORD); |
265 | + emit_insn (gen_unaligned_loadsi (regs[j], mem)); |
266 | + } |
267 | + srcoffset += block_size_bytes; |
268 | + } |
269 | + |
270 | + /* Store words. */ |
271 | + if (dst_aligned && interleave_factor > 1) |
272 | + { |
273 | + emit_insn (arm_gen_store_multiple (arm_regs_in_sequence, |
274 | + interleave_factor, dst, TRUE, |
275 | + dstbase, &dstoffset)); |
276 | + dst_autoinc += UNITS_PER_WORD * interleave_factor; |
277 | + } |
278 | + else |
279 | + { |
280 | + for (j = 0; j < interleave_factor; j++) |
281 | + { |
282 | + addr = plus_constant (dst, dstoffset + j * UNITS_PER_WORD |
283 | + - dst_autoinc); |
284 | + mem = adjust_automodify_address (dstbase, SImode, addr, |
285 | + dstoffset + j * UNITS_PER_WORD); |
286 | + emit_insn (gen_unaligned_storesi (mem, regs[j])); |
287 | + } |
288 | + dstoffset += block_size_bytes; |
289 | + } |
290 | + |
291 | + remaining -= block_size_bytes; |
292 | + } |
293 | + |
294 | + /* Copy any whole words left (note these aren't interleaved with any |
295 | + subsequent halfword/byte load/stores in the interests of simplicity). */ |
296 | + |
297 | + words = remaining / UNITS_PER_WORD; |
298 | + |
299 | + gcc_assert (words < interleave_factor); |
300 | + |
301 | + if (src_aligned && words > 1) |
302 | + { |
303 | + emit_insn (arm_gen_load_multiple (arm_regs_in_sequence, words, src, |
304 | + TRUE, srcbase, &srcoffset)); |
305 | + src_autoinc += UNITS_PER_WORD * words; |
306 | + } |
307 | + else |
308 | + { |
309 | + for (j = 0; j < words; j++) |
310 | + { |
311 | + addr = plus_constant (src, |
312 | + srcoffset + j * UNITS_PER_WORD - src_autoinc); |
313 | + mem = adjust_automodify_address (srcbase, SImode, addr, |
314 | + srcoffset + j * UNITS_PER_WORD); |
315 | + emit_insn (gen_unaligned_loadsi (regs[j], mem)); |
316 | + } |
317 | + srcoffset += words * UNITS_PER_WORD; |
318 | + } |
319 | + |
320 | + if (dst_aligned && words > 1) |
321 | + { |
322 | + emit_insn (arm_gen_store_multiple (arm_regs_in_sequence, words, dst, |
323 | + TRUE, dstbase, &dstoffset)); |
324 | + dst_autoinc += words * UNITS_PER_WORD; |
325 | + } |
326 | + else |
327 | + { |
328 | + for (j = 0; j < words; j++) |
329 | + { |
330 | + addr = plus_constant (dst, |
331 | + dstoffset + j * UNITS_PER_WORD - dst_autoinc); |
332 | + mem = adjust_automodify_address (dstbase, SImode, addr, |
333 | + dstoffset + j * UNITS_PER_WORD); |
334 | + emit_insn (gen_unaligned_storesi (mem, regs[j])); |
335 | + } |
336 | + dstoffset += words * UNITS_PER_WORD; |
337 | + } |
338 | + |
339 | + remaining -= words * UNITS_PER_WORD; |
340 | + |
341 | + gcc_assert (remaining < 4); |
342 | + |
343 | + /* Copy a halfword if necessary. */ |
344 | + |
345 | + if (remaining >= 2) |
346 | + { |
347 | + halfword_tmp = gen_reg_rtx (SImode); |
348 | + |
349 | + addr = plus_constant (src, srcoffset - src_autoinc); |
350 | + mem = adjust_automodify_address (srcbase, HImode, addr, srcoffset); |
351 | + emit_insn (gen_unaligned_loadhiu (halfword_tmp, mem)); |
352 | + |
353 | + /* Either write out immediately, or delay until we've loaded the last |
354 | + byte, depending on interleave factor. */ |
355 | + if (interleave_factor == 1) |
356 | + { |
357 | + addr = plus_constant (dst, dstoffset - dst_autoinc); |
358 | + mem = adjust_automodify_address (dstbase, HImode, addr, dstoffset); |
359 | + emit_insn (gen_unaligned_storehi (mem, |
360 | + gen_lowpart (HImode, halfword_tmp))); |
361 | + halfword_tmp = NULL; |
362 | + dstoffset += 2; |
363 | + } |
364 | + |
365 | + remaining -= 2; |
366 | + srcoffset += 2; |
367 | + } |
368 | + |
369 | + gcc_assert (remaining < 2); |
370 | + |
371 | + /* Copy last byte. */ |
372 | + |
373 | + if ((remaining & 1) != 0) |
374 | + { |
375 | + byte_tmp = gen_reg_rtx (SImode); |
376 | + |
377 | + addr = plus_constant (src, srcoffset - src_autoinc); |
378 | + mem = adjust_automodify_address (srcbase, QImode, addr, srcoffset); |
379 | + emit_move_insn (gen_lowpart (QImode, byte_tmp), mem); |
380 | + |
381 | + if (interleave_factor == 1) |
382 | + { |
383 | + addr = plus_constant (dst, dstoffset - dst_autoinc); |
384 | + mem = adjust_automodify_address (dstbase, QImode, addr, dstoffset); |
385 | + emit_move_insn (mem, gen_lowpart (QImode, byte_tmp)); |
386 | + byte_tmp = NULL; |
387 | + dstoffset++; |
388 | + } |
389 | + |
390 | + remaining--; |
391 | + srcoffset++; |
392 | + } |
393 | + |
394 | + /* Store last halfword if we haven't done so already. */ |
395 | + |
396 | + if (halfword_tmp) |
397 | + { |
398 | + addr = plus_constant (dst, dstoffset - dst_autoinc); |
399 | + mem = adjust_automodify_address (dstbase, HImode, addr, dstoffset); |
400 | + emit_insn (gen_unaligned_storehi (mem, |
401 | + gen_lowpart (HImode, halfword_tmp))); |
402 | + dstoffset += 2; |
403 | + } |
404 | + |
405 | + /* Likewise for last byte. */ |
406 | + |
407 | + if (byte_tmp) |
408 | + { |
409 | + addr = plus_constant (dst, dstoffset - dst_autoinc); |
410 | + mem = adjust_automodify_address (dstbase, QImode, addr, dstoffset); |
411 | + emit_move_insn (mem, gen_lowpart (QImode, byte_tmp)); |
412 | + dstoffset++; |
413 | + } |
414 | + |
415 | + gcc_assert (remaining == 0 && srcoffset == dstoffset); |
416 | +} |
417 | + |
418 | +/* From mips_adjust_block_mem: |
419 | + |
420 | + Helper function for doing a loop-based block operation on memory |
421 | + reference MEM. Each iteration of the loop will operate on LENGTH |
422 | + bytes of MEM. |
423 | + |
424 | + Create a new base register for use within the loop and point it to |
425 | + the start of MEM. Create a new memory reference that uses this |
426 | + register. Store them in *LOOP_REG and *LOOP_MEM respectively. */ |
427 | + |
428 | +static void |
429 | +arm_adjust_block_mem (rtx mem, HOST_WIDE_INT length, rtx *loop_reg, |
430 | + rtx *loop_mem) |
431 | +{ |
432 | + *loop_reg = copy_addr_to_reg (XEXP (mem, 0)); |
433 | + |
434 | + /* Although the new mem does not refer to a known location, |
435 | + it does keep up to LENGTH bytes of alignment. */ |
436 | + *loop_mem = change_address (mem, BLKmode, *loop_reg); |
437 | + set_mem_align (*loop_mem, MIN (MEM_ALIGN (mem), length * BITS_PER_UNIT)); |
438 | +} |
439 | + |
440 | +/* From mips_block_move_loop: |
441 | + |
442 | + Move LENGTH bytes from SRC to DEST using a loop that moves BYTES_PER_ITER |
443 | + bytes at a time. LENGTH must be at least BYTES_PER_ITER. Assume that |
444 | + the memory regions do not overlap. */ |
445 | + |
446 | +static void |
447 | +arm_block_move_unaligned_loop (rtx dest, rtx src, HOST_WIDE_INT length, |
448 | + unsigned int interleave_factor, |
449 | + HOST_WIDE_INT bytes_per_iter) |
450 | +{ |
451 | + rtx label, src_reg, dest_reg, final_src, test; |
452 | + HOST_WIDE_INT leftover; |
453 | + |
454 | + leftover = length % bytes_per_iter; |
455 | + length -= leftover; |
456 | + |
457 | + /* Create registers and memory references for use within the loop. */ |
458 | + arm_adjust_block_mem (src, bytes_per_iter, &src_reg, &src); |
459 | + arm_adjust_block_mem (dest, bytes_per_iter, &dest_reg, &dest); |
460 | + |
461 | + /* Calculate the value that SRC_REG should have after the last iteration of |
462 | + the loop. */ |
463 | + final_src = expand_simple_binop (Pmode, PLUS, src_reg, GEN_INT (length), |
464 | + 0, 0, OPTAB_WIDEN); |
465 | + |
466 | + /* Emit the start of the loop. */ |
467 | + label = gen_label_rtx (); |
468 | + emit_label (label); |
469 | + |
470 | + /* Emit the loop body. */ |
471 | + arm_block_move_unaligned_straight (dest, src, bytes_per_iter, |
472 | + interleave_factor); |
473 | + |
474 | + /* Move on to the next block. */ |
475 | + emit_move_insn (src_reg, plus_constant (src_reg, bytes_per_iter)); |
476 | + emit_move_insn (dest_reg, plus_constant (dest_reg, bytes_per_iter)); |
477 | + |
478 | + /* Emit the loop condition. */ |
479 | + test = gen_rtx_NE (VOIDmode, src_reg, final_src); |
480 | + emit_jump_insn (gen_cbranchsi4 (test, src_reg, final_src, label)); |
481 | + |
482 | + /* Mop up any left-over bytes. */ |
483 | + if (leftover) |
484 | + arm_block_move_unaligned_straight (dest, src, leftover, interleave_factor); |
485 | +} |
486 | + |
487 | +/* Emit a block move when either the source or destination is unaligned (not |
488 | + aligned to a four-byte boundary). This may need further tuning depending on |
489 | + core type, optimize_size setting, alignment of source/destination, etc. */ |
490 | + |
491 | +static int |
492 | +arm_movmemqi_unaligned (rtx *operands) |
493 | +{ |
494 | + HOST_WIDE_INT length = INTVAL (operands[2]); |
495 | + |
496 | + if (optimize_size) |
497 | + { |
498 | + bool src_aligned = MEM_ALIGN (operands[1]) >= BITS_PER_WORD; |
499 | + bool dst_aligned = MEM_ALIGN (operands[0]) >= BITS_PER_WORD; |
500 | + /* Inlined memcpy using ldr/str/ldrh/strh can be quite big: try to limit |
501 | + size of code if optimizing for size. We'll use ldm/stm if src_aligned |
502 | + or dst_aligned though: allow more interleaving in those cases since the |
503 | + resulting code can be smaller. */ |
504 | + unsigned int interleave_factor = (src_aligned || dst_aligned) ? 2 : 1; |
505 | + HOST_WIDE_INT bytes_per_iter = (src_aligned || dst_aligned) ? 8 : 4; |
506 | + |
507 | + if (length > 12) |
508 | + arm_block_move_unaligned_loop (operands[0], operands[1], length, |
509 | + interleave_factor, bytes_per_iter); |
510 | + else |
511 | + arm_block_move_unaligned_straight (operands[0], operands[1], length, |
512 | + interleave_factor); |
513 | + } |
514 | + else |
515 | + { |
516 | + /* Note that the loop created by arm_block_move_unaligned_loop may be |
517 | + subject to loop unrolling, which makes tuning this condition a little |
518 | + awkward. */ |
519 | + if (length > 32) |
520 | + arm_block_move_unaligned_loop (operands[0], operands[1], length, 4, 16); |
521 | + else |
522 | + arm_block_move_unaligned_straight (operands[0], operands[1], length, 4); |
523 | + } |
524 | + |
525 | + return 1; |
526 | +} |
527 | + |
528 | int |
529 | arm_gen_movmemqi (rtx *operands) |
530 | { |
531 | @@ -10517,8 +10860,13 @@ |
532 | |
533 | if (GET_CODE (operands[2]) != CONST_INT |
534 | || GET_CODE (operands[3]) != CONST_INT |
535 | - || INTVAL (operands[2]) > 64 |
536 | - || INTVAL (operands[3]) & 3) |
537 | + || INTVAL (operands[2]) > 64) |
538 | + return 0; |
539 | + |
540 | + if (unaligned_access && (INTVAL (operands[3]) & 3) != 0) |
541 | + return arm_movmemqi_unaligned (operands); |
542 | + |
543 | + if (INTVAL (operands[3]) & 3) |
544 | return 0; |
545 | |
546 | dstbase = operands[0]; |
547 | @@ -23095,10 +23443,13 @@ |
548 | static bool |
549 | arm_vector_alignment_reachable (const_tree type, bool is_packed) |
550 | { |
551 | - /* Vectors which aren't in packed structures will not be less aligned than |
552 | - the natural alignment of their element type, so this is safe. */ |
553 | + /* NOTE: returning true here will unconditionally peel loop iterations so |
554 | + that aligned accesses can be used. This is undesirable when misaligned |
555 | + accesses are available, particularly for small loop iteration counts, |
556 | + since the overhead for dispatching to multiple versions of the loop is |
557 | + quite high. */ |
558 | if (TARGET_NEON && !BYTES_BIG_ENDIAN) |
559 | - return !is_packed; |
560 | + return false; |
561 | |
562 | return default_builtin_vector_alignment_reachable (type, is_packed); |
563 | } |
564 | |
565 | === modified file 'gcc/config/arm/arm.md' |
566 | --- gcc/config/arm/arm.md 2011-03-11 14:26:34 +0000 |
567 | +++ gcc/config/arm/arm.md 2011-04-13 13:28:18 +0000 |
568 | @@ -104,6 +104,8 @@ |
569 | (UNSPEC_SYMBOL_OFFSET 27) ; The offset of the start of the symbol from |
570 | ; another symbolic address. |
571 | (UNSPEC_MEMORY_BARRIER 28) ; Represent a memory barrier. |
572 | + (UNSPEC_UNALIGNED_LOAD 29) |
573 | + (UNSPEC_UNALIGNED_STORE 30) |
574 | ] |
575 | ) |
576 | |
577 | @@ -2372,7 +2374,7 @@ |
578 | ;;; this insv pattern, so this pattern needs to be reevalutated. |
579 | |
580 | (define_expand "insv" |
581 | - [(set (zero_extract:SI (match_operand:SI 0 "s_register_operand" "") |
582 | + [(set (zero_extract:SI (match_operand:SI 0 "nonimmediate_operand" "") |
583 | (match_operand:SI 1 "general_operand" "") |
584 | (match_operand:SI 2 "general_operand" "")) |
585 | (match_operand:SI 3 "reg_or_int_operand" ""))] |
586 | @@ -2386,35 +2388,66 @@ |
587 | |
588 | if (arm_arch_thumb2) |
589 | { |
590 | - bool use_bfi = TRUE; |
591 | - |
592 | - if (GET_CODE (operands[3]) == CONST_INT) |
593 | - { |
594 | - HOST_WIDE_INT val = INTVAL (operands[3]) & mask; |
595 | - |
596 | - if (val == 0) |
597 | - { |
598 | - emit_insn (gen_insv_zero (operands[0], operands[1], |
599 | - operands[2])); |
600 | + if (unaligned_access && MEM_P (operands[0]) |
601 | + && s_register_operand (operands[3], GET_MODE (operands[3])) |
602 | + && (width == 16 || width == 32) && (start_bit % BITS_PER_UNIT) == 0) |
603 | + { |
604 | + rtx base_addr; |
605 | + |
606 | + if (width == 32) |
607 | + { |
608 | + base_addr = adjust_address (operands[0], SImode, |
609 | + start_bit / BITS_PER_UNIT); |
610 | + emit_insn (gen_unaligned_storesi (base_addr, operands[3])); |
611 | + } |
612 | + else |
613 | + { |
614 | + rtx tmp = gen_reg_rtx (HImode); |
615 | + |
616 | + base_addr = adjust_address (operands[0], HImode, |
617 | + start_bit / BITS_PER_UNIT); |
618 | + emit_move_insn (tmp, gen_lowpart (HImode, operands[3])); |
619 | + emit_insn (gen_unaligned_storehi (base_addr, tmp)); |
620 | + } |
621 | + DONE; |
622 | + } |
623 | + else if (s_register_operand (operands[0], GET_MODE (operands[0]))) |
624 | + { |
625 | + bool use_bfi = TRUE; |
626 | + |
627 | + if (GET_CODE (operands[3]) == CONST_INT) |
628 | + { |
629 | + HOST_WIDE_INT val = INTVAL (operands[3]) & mask; |
630 | + |
631 | + if (val == 0) |
632 | + { |
633 | + emit_insn (gen_insv_zero (operands[0], operands[1], |
634 | + operands[2])); |
635 | + DONE; |
636 | + } |
637 | + |
638 | + /* See if the set can be done with a single orr instruction. */ |
639 | + if (val == mask && const_ok_for_arm (val << start_bit)) |
640 | + use_bfi = FALSE; |
641 | + } |
642 | + |
643 | + if (use_bfi) |
644 | + { |
645 | + if (GET_CODE (operands[3]) != REG) |
646 | + operands[3] = force_reg (SImode, operands[3]); |
647 | + |
648 | + emit_insn (gen_insv_t2 (operands[0], operands[1], operands[2], |
649 | + operands[3])); |
650 | DONE; |
651 | } |
652 | - |
653 | - /* See if the set can be done with a single orr instruction. */ |
654 | - if (val == mask && const_ok_for_arm (val << start_bit)) |
655 | - use_bfi = FALSE; |
656 | - } |
657 | - |
658 | - if (use_bfi) |
659 | - { |
660 | - if (GET_CODE (operands[3]) != REG) |
661 | - operands[3] = force_reg (SImode, operands[3]); |
662 | - |
663 | - emit_insn (gen_insv_t2 (operands[0], operands[1], operands[2], |
664 | - operands[3])); |
665 | - DONE; |
666 | - } |
667 | + } |
668 | + else |
669 | + FAIL; |
670 | } |
671 | |
672 | + if (!s_register_operand (operands[0], GET_MODE (operands[0]))) |
673 | + FAIL; |
674 | + |
675 | target = copy_rtx (operands[0]); |
676 | /* Avoid using a subreg as a subtarget, and avoid writing a paradoxical |
677 | subreg as the final target. */ |
678 | @@ -3604,7 +3637,7 @@ |
679 | |
680 | (define_expand "extzv" |
681 | [(set (match_dup 4) |
682 | - (ashift:SI (match_operand:SI 1 "register_operand" "") |
683 | + (ashift:SI (match_operand:SI 1 "nonimmediate_operand" "") |
684 | (match_operand:SI 2 "const_int_operand" ""))) |
685 | (set (match_operand:SI 0 "register_operand" "") |
686 | (lshiftrt:SI (match_dup 4) |
687 | @@ -3617,10 +3650,53 @@ |
688 | |
689 | if (arm_arch_thumb2) |
690 | { |
691 | - emit_insn (gen_extzv_t2 (operands[0], operands[1], operands[2], |
692 | - operands[3])); |
693 | - DONE; |
694 | + HOST_WIDE_INT width = INTVAL (operands[2]); |
695 | + HOST_WIDE_INT bitpos = INTVAL (operands[3]); |
696 | + |
697 | + if (unaligned_access && MEM_P (operands[1]) |
698 | + && (width == 16 || width == 32) && (bitpos % BITS_PER_UNIT) == 0) |
699 | + { |
700 | + rtx base_addr; |
701 | + |
702 | + if (width == 32) |
703 | + { |
704 | + base_addr = adjust_address (operands[1], SImode, |
705 | + bitpos / BITS_PER_UNIT); |
706 | + emit_insn (gen_unaligned_loadsi (operands[0], base_addr)); |
707 | + } |
708 | + else |
709 | + { |
710 | + rtx dest = operands[0]; |
711 | + rtx tmp = gen_reg_rtx (SImode); |
712 | + |
713 | + /* We may get a paradoxical subreg here. Strip it off. */ |
714 | + if (GET_CODE (dest) == SUBREG |
715 | + && GET_MODE (dest) == SImode |
716 | + && GET_MODE (SUBREG_REG (dest)) == HImode) |
717 | + dest = SUBREG_REG (dest); |
718 | + |
719 | + if (GET_MODE_BITSIZE (GET_MODE (dest)) != width) |
720 | + FAIL; |
721 | + |
722 | + base_addr = adjust_address (operands[1], HImode, |
723 | + bitpos / BITS_PER_UNIT); |
724 | + emit_insn (gen_unaligned_loadhiu (tmp, base_addr)); |
725 | + emit_move_insn (gen_lowpart (SImode, dest), tmp); |
726 | + } |
727 | + DONE; |
728 | + } |
729 | + else if (s_register_operand (operands[1], GET_MODE (operands[1]))) |
730 | + { |
731 | + emit_insn (gen_extzv_t2 (operands[0], operands[1], operands[2], |
732 | + operands[3])); |
733 | + DONE; |
734 | + } |
735 | + else |
736 | + FAIL; |
737 | } |
738 | + |
739 | + if (!s_register_operand (operands[1], GET_MODE (operands[1]))) |
740 | + FAIL; |
741 | |
742 | operands[3] = GEN_INT (rshift); |
743 | |
744 | @@ -3635,7 +3711,103 @@ |
745 | }" |
746 | ) |
747 | |
748 | -(define_insn "extv" |
749 | +(define_expand "extv" |
750 | + [(set (match_operand 0 "s_register_operand" "") |
751 | + (sign_extract (match_operand 1 "nonimmediate_operand" "") |
752 | + (match_operand 2 "const_int_operand" "") |
753 | + (match_operand 3 "const_int_operand" "")))] |
754 | + "arm_arch_thumb2" |
755 | +{ |
756 | + HOST_WIDE_INT width = INTVAL (operands[2]); |
757 | + HOST_WIDE_INT bitpos = INTVAL (operands[3]); |
758 | + |
759 | + if (unaligned_access && MEM_P (operands[1]) && (width == 16 || width == 32) |
760 | + && (bitpos % BITS_PER_UNIT) == 0) |
761 | + { |
762 | + rtx base_addr; |
763 | + |
764 | + if (width == 32) |
765 | + { |
766 | + base_addr = adjust_address (operands[1], SImode, |
767 | + bitpos / BITS_PER_UNIT); |
768 | + emit_insn (gen_unaligned_loadsi (operands[0], base_addr)); |
769 | + } |
770 | + else |
771 | + { |
772 | + rtx dest = operands[0]; |
773 | + rtx tmp = gen_reg_rtx (SImode); |
774 | + |
775 | + /* We may get a paradoxical subreg here. Strip it off. */ |
776 | + if (GET_CODE (dest) == SUBREG |
777 | + && GET_MODE (dest) == SImode |
778 | + && GET_MODE (SUBREG_REG (dest)) == HImode) |
779 | + dest = SUBREG_REG (dest); |
780 | + |
781 | + if (GET_MODE_BITSIZE (GET_MODE (dest)) != width) |
782 | + FAIL; |
783 | + |
784 | + base_addr = adjust_address (operands[1], HImode, |
785 | + bitpos / BITS_PER_UNIT); |
786 | + emit_insn (gen_unaligned_loadhis (tmp, base_addr)); |
787 | + emit_move_insn (gen_lowpart (SImode, dest), tmp); |
788 | + } |
789 | + |
790 | + DONE; |
791 | + } |
792 | + else if (s_register_operand (operands[1], GET_MODE (operands[1]))) |
793 | + DONE; |
794 | + else |
795 | + FAIL; |
796 | +}) |
797 | + |
798 | +; ARMv6+ unaligned load/store instructions (used for packed structure accesses). |
799 | + |
800 | +(define_insn "unaligned_loadsi" |
801 | + [(set (match_operand:SI 0 "s_register_operand" "=r") |
802 | + (unspec:SI [(match_operand:SI 1 "memory_operand" "m")] |
803 | + UNSPEC_UNALIGNED_LOAD))] |
804 | + "unaligned_access" |
805 | + "ldr%?\t%0, %1\t@ unaligned" |
806 | + [(set_attr "predicable" "yes") |
807 | + (set_attr "type" "load1")]) |
808 | + |
809 | +(define_insn "unaligned_loadhis" |
810 | + [(set (match_operand:SI 0 "s_register_operand" "=r") |
811 | + (sign_extend:SI (unspec:HI [(match_operand:HI 1 "memory_operand" "m")] |
812 | + UNSPEC_UNALIGNED_LOAD)))] |
813 | + "unaligned_access" |
814 | + "ldr%(sh%)\t%0, %1\t@ unaligned" |
815 | + [(set_attr "predicable" "yes") |
816 | + (set_attr "type" "load_byte")]) |
817 | + |
818 | +(define_insn "unaligned_loadhiu" |
819 | + [(set (match_operand:SI 0 "s_register_operand" "=r") |
820 | + (zero_extend:SI (unspec:HI [(match_operand:HI 1 "memory_operand" "m")] |
821 | + UNSPEC_UNALIGNED_LOAD)))] |
822 | + "unaligned_access" |
823 | + "ldr%(h%)\t%0, %1\t@ unaligned" |
824 | + [(set_attr "predicable" "yes") |
825 | + (set_attr "type" "load_byte")]) |
826 | + |
827 | +(define_insn "unaligned_storesi" |
828 | + [(set (match_operand:SI 0 "memory_operand" "=m") |
829 | + (unspec:SI [(match_operand:SI 1 "s_register_operand" "r")] |
830 | + UNSPEC_UNALIGNED_STORE))] |
831 | + "unaligned_access" |
832 | + "str%?\t%1, %0\t@ unaligned" |
833 | + [(set_attr "predicable" "yes") |
834 | + (set_attr "type" "store1")]) |
835 | + |
836 | +(define_insn "unaligned_storehi" |
837 | + [(set (match_operand:HI 0 "memory_operand" "=m") |
838 | + (unspec:HI [(match_operand:HI 1 "s_register_operand" "r")] |
839 | + UNSPEC_UNALIGNED_STORE))] |
840 | + "unaligned_access" |
841 | + "str%(h%)\t%1, %0\t@ unaligned" |
842 | + [(set_attr "predicable" "yes") |
843 | + (set_attr "type" "store1")]) |
844 | + |
845 | +(define_insn "*extv_reg" |
846 | [(set (match_operand:SI 0 "s_register_operand" "=r") |
847 | (sign_extract:SI (match_operand:SI 1 "s_register_operand" "r") |
848 | (match_operand:SI 2 "const_int_operand" "M") |
849 | |
850 | === modified file 'gcc/config/arm/arm.opt' |
851 | --- gcc/config/arm/arm.opt 2010-08-05 15:20:54 +0000 |
852 | +++ gcc/config/arm/arm.opt 2011-04-13 13:28:18 +0000 |
853 | @@ -173,3 +173,7 @@ |
854 | Target Report Var(fix_cm3_ldrd) Init(2) |
855 | Avoid overlapping destination and address registers on LDRD instructions |
856 | that may trigger Cortex-M3 errata. |
857 | + |
858 | +munaligned-access |
859 | +Target Report Var(unaligned_access) Init(2) |
860 | +Enable unaligned word and halfword accesses to packed data. |
861 | |
862 | === modified file 'gcc/config/arm/unwind-arm.c' |
863 | --- gcc/config/arm/unwind-arm.c 2010-08-12 12:39:35 +0000 |
864 | +++ gcc/config/arm/unwind-arm.c 2011-04-13 13:28:18 +0000 |
865 | @@ -32,13 +32,18 @@ |
866 | typedef unsigned char bool; |
867 | |
868 | typedef struct _ZSt9type_info type_info; /* This names C++ type_info type */ |
869 | +enum __cxa_type_match_result |
870 | + { |
871 | + ctm_failed = 0, |
872 | + ctm_succeeded = 1, |
873 | + ctm_succeeded_with_ptr_to_base = 2 |
874 | + }; |
875 | |
876 | void __attribute__((weak)) __cxa_call_unexpected(_Unwind_Control_Block *ucbp); |
877 | bool __attribute__((weak)) __cxa_begin_cleanup(_Unwind_Control_Block *ucbp); |
878 | -bool __attribute__((weak)) __cxa_type_match(_Unwind_Control_Block *ucbp, |
879 | - const type_info *rttip, |
880 | - bool is_reference, |
881 | - void **matched_object); |
882 | +enum __cxa_type_match_result __attribute__((weak)) __cxa_type_match |
883 | + (_Unwind_Control_Block *ucbp, const type_info *rttip, |
884 | + bool is_reference, void **matched_object); |
885 | |
886 | _Unwind_Ptr __attribute__((weak)) |
887 | __gnu_Unwind_Find_exidx (_Unwind_Ptr, int *); |
888 | @@ -1107,6 +1112,7 @@ |
889 | _uw rtti; |
890 | bool is_reference = (data[0] & uint32_highbit) != 0; |
891 | void *matched; |
892 | + enum __cxa_type_match_result match_type; |
893 | |
894 | /* Check for no-throw areas. */ |
895 | if (data[1] == (_uw) -2) |
896 | @@ -1118,17 +1124,31 @@ |
897 | { |
898 | /* Match a catch specification. */ |
899 | rtti = _Unwind_decode_target2 ((_uw) &data[1]); |
900 | - if (!__cxa_type_match (ucbp, (type_info *) rtti, |
901 | - is_reference, |
902 | - &matched)) |
903 | - matched = (void *)0; |
904 | + match_type = __cxa_type_match (ucbp, |
905 | + (type_info *) rtti, |
906 | + is_reference, |
907 | + &matched); |
908 | } |
909 | + else |
910 | + match_type = ctm_succeeded; |
911 | |
912 | - if (matched) |
913 | + if (match_type) |
914 | { |
915 | ucbp->barrier_cache.sp = |
916 | _Unwind_GetGR (context, R_SP); |
917 | - ucbp->barrier_cache.bitpattern[0] = (_uw) matched; |
918 | + // ctm_succeeded_with_ptr_to_base really |
919 | + // means _c_t_m indirected the pointer |
920 | + // object. We have to reconstruct the |
921 | + // additional pointer layer by using a temporary. |
922 | + if (match_type == ctm_succeeded_with_ptr_to_base) |
923 | + { |
924 | + ucbp->barrier_cache.bitpattern[2] |
925 | + = (_uw) matched; |
926 | + ucbp->barrier_cache.bitpattern[0] |
927 | + = (_uw) &ucbp->barrier_cache.bitpattern[2]; |
928 | + } |
929 | + else |
930 | + ucbp->barrier_cache.bitpattern[0] = (_uw) matched; |
931 | ucbp->barrier_cache.bitpattern[1] = (_uw) data; |
932 | return _URC_HANDLER_FOUND; |
933 | } |
934 | |
935 | === modified file 'gcc/expmed.c' |
936 | --- gcc/expmed.c 2011-01-18 19:06:14 +0000 |
937 | +++ gcc/expmed.c 2011-04-13 13:28:18 +0000 |
938 | @@ -720,7 +720,7 @@ |
939 | /* On big-endian machines, we count bits from the most significant. |
940 | If the bit field insn does not, we must invert. */ |
941 | |
942 | - if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN) |
943 | + if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN && !MEM_P (xop0)) |
944 | xbitpos = unit - bitsize - xbitpos; |
945 | |
946 | /* We have been counting XBITPOS within UNIT. |
947 | @@ -1564,7 +1564,7 @@ |
948 | |
949 | /* On big-endian machines, we count bits from the most significant. |
950 | If the bit field insn does not, we must invert. */ |
951 | - if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN) |
952 | + if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN && !MEM_P (xop0)) |
953 | xbitpos = unit - bitsize - xbitpos; |
954 | |
955 | /* Now convert from counting within UNIT to counting in EXT_MODE. */ |
956 | |
957 | === modified file 'gcc/final.c' |
958 | --- gcc/final.c 2011-02-08 10:51:58 +0000 |
959 | +++ gcc/final.c 2011-04-13 13:28:18 +0000 |
960 | @@ -2751,6 +2751,12 @@ |
961 | if (filename == NULL) |
962 | return false; |
963 | |
964 | +#ifdef HAVE_ATTR_length |
965 | + /* Prevent duplicate line markers at the same location. */ |
966 | + if (get_attr_length (insn) == 0) |
967 | + return false; |
968 | +#endif |
969 | + |
970 | if (force_source_line |
971 | || filename != last_filename |
972 | || last_linenum != linenum) |
973 | |
974 | === modified file 'gcc/ifcvt.c' |
975 | --- gcc/ifcvt.c 2011-02-08 12:07:29 +0000 |
976 | +++ gcc/ifcvt.c 2011-04-13 13:28:18 +0000 |
977 | @@ -311,6 +311,10 @@ |
978 | |
979 | for (insn = start; ; insn = NEXT_INSN (insn)) |
980 | { |
981 | + /* dwarf2out can't cope with conditional prologues. */ |
982 | + if (NOTE_P (insn) && NOTE_KIND (insn) == NOTE_INSN_PROLOGUE_END) |
983 | + return FALSE; |
984 | + |
985 | if (NOTE_P (insn) || DEBUG_INSN_P (insn)) |
986 | goto insn_done; |
987 | |
988 | |
989 | === modified file 'gcc/ipa-pure-const.c' |
990 | --- gcc/ipa-pure-const.c 2010-11-15 22:27:24 +0000 |
991 | +++ gcc/ipa-pure-const.c 2011-04-13 13:28:18 +0000 |
992 | @@ -422,7 +422,14 @@ |
993 | if (dump_file) |
994 | fprintf (dump_file, " Volatile stmt is not const/pure\n"); |
995 | } |
996 | - |
997 | + |
998 | + if (gimple_has_volatile_ops (stmt)) |
999 | + { |
1000 | + local->pure_const_state = IPA_NEITHER; |
1001 | + if (dump_file) |
1002 | + fprintf (dump_file, " Volatile stmt is not const/pure\n"); |
1003 | + } |
1004 | + |
1005 | /* Look for loads and stores. */ |
1006 | walk_stmt_load_store_ops (stmt, local, check_load, check_store); |
1007 | |
1008 | |
1009 | === added file 'gcc/testsuite/gcc.c-torture/compile/20110322-1.c' |
1010 | --- gcc/testsuite/gcc.c-torture/compile/20110322-1.c 1970-01-01 00:00:00 +0000 |
1011 | +++ gcc/testsuite/gcc.c-torture/compile/20110322-1.c 2011-04-13 13:28:18 +0000 |
1012 | @@ -0,0 +1,22 @@ |
1013 | +void asn1_length_der (unsigned long int len, unsigned char *ans, int *ans_len) |
1014 | +{ |
1015 | + int k; |
1016 | + unsigned char temp[4]; |
1017 | + if (len < 128) { |
1018 | + if (ans != ((void *) 0)) |
1019 | + ans[0] = (unsigned char) len; |
1020 | + *ans_len = 1; |
1021 | + } else { |
1022 | + k = 0; |
1023 | + while (len) { |
1024 | + temp[k++] = len & 0xFF; |
1025 | + len = len >> 8; |
1026 | + } |
1027 | + *ans_len = k + 1; |
1028 | + if (ans != ((void *) 0)) { |
1029 | + ans[0] = ((unsigned char) k & 0x7F) + 128; |
1030 | + while (k--) |
1031 | + ans[*ans_len - 1 - k] = temp[k]; |
1032 | + } |
1033 | + } |
1034 | +} |
1035 | |
1036 | === added file 'gcc/testsuite/gcc.dg/pr47763.c' |
1037 | --- gcc/testsuite/gcc.dg/pr47763.c 1970-01-01 00:00:00 +0000 |
1038 | +++ gcc/testsuite/gcc.dg/pr47763.c 2011-04-13 13:28:18 +0000 |
1039 | @@ -0,0 +1,9 @@ |
1040 | +/* { dg-do compile } */ |
1041 | +/* { dg-options "-O2 -funroll-loops -fdump-rtl-web" } */ |
1042 | + |
1043 | +foo() |
1044 | +{ |
1045 | +} |
1046 | + |
1047 | +/* { dg-final { scan-rtl-dump-not "Web oldreg" "web" } } */ |
1048 | +/* { dg-final { cleanup-rtl-dump "web" } } */ |
1049 | |
1050 | === modified file 'gcc/testsuite/lib/target-supports.exp' |
1051 | --- gcc/testsuite/lib/target-supports.exp 2011-02-22 11:38:56 +0000 |
1052 | +++ gcc/testsuite/lib/target-supports.exp 2011-04-13 13:28:18 +0000 |
1053 | @@ -2722,8 +2722,9 @@ |
1054 | if [info exists et_vector_alignment_reachable_saved] { |
1055 | verbose "check_effective_target_vector_alignment_reachable: using cached result" 2 |
1056 | } else { |
1057 | - if { [check_effective_target_vect_aligned_arrays] |
1058 | - || [check_effective_target_natural_alignment_32] } { |
1059 | + if { ([check_effective_target_vect_aligned_arrays] |
1060 | + || [check_effective_target_natural_alignment_32]) |
1061 | + && !([istarget arm*-*-*] && [check_effective_target_arm_neon]) } { |
1062 | set et_vector_alignment_reachable_saved 1 |
1063 | } else { |
1064 | set et_vector_alignment_reachable_saved 0 |
1065 | |
1066 | === modified file 'gcc/tree-ssa-copyrename.c' |
1067 | --- gcc/tree-ssa-copyrename.c 2011-01-20 10:36:29 +0000 |
1068 | +++ gcc/tree-ssa-copyrename.c 2011-04-13 13:28:18 +0000 |
1069 | @@ -225,6 +225,18 @@ |
1070 | ign2 = false; |
1071 | } |
1072 | |
1073 | + /* Don't coalesce if the new chosen root variable would be read-only. |
1074 | + If both ign1 && ign2, then the root var of the larger partition |
1075 | + wins, so reject in that case if any of the root vars is TREE_READONLY. |
1076 | + Otherwise reject only if the root var, on which replace_ssa_name_symbol |
1077 | + will be called below, is readonly. */ |
1078 | + if ((TREE_READONLY (root1) && ign2) || (TREE_READONLY (root2) && ign1)) |
1079 | + { |
1080 | + if (debug) |
1081 | + fprintf (debug, " : Readonly variable. No coalesce.\n"); |
1082 | + return false; |
1083 | + } |
1084 | + |
1085 | /* Don't coalesce if the two variables aren't type compatible . */ |
1086 | if (!types_compatible_p (TREE_TYPE (root1), TREE_TYPE (root2)) |
1087 | /* There is a disconnect between the middle-end type-system and |
1088 | |
1089 | === modified file 'gcc/web.c' |
1090 | --- gcc/web.c 2010-01-26 16:27:34 +0000 |
1091 | +++ gcc/web.c 2011-04-13 13:28:18 +0000 |
1092 | @@ -350,7 +350,17 @@ |
1093 | FOR_BB_INSNS (bb, insn) |
1094 | { |
1095 | unsigned int uid = INSN_UID (insn); |
1096 | - if (NONDEBUG_INSN_P (insn)) |
1097 | + |
1098 | + if (NONDEBUG_INSN_P (insn) |
1099 | + /* Ignore naked clobber. For example, reg 134 in the second insn |
1100 | + of the following sequence will not be replaced. |
1101 | + |
1102 | + (insn (clobber (reg:SI 134))) |
1103 | + |
1104 | + (insn (set (reg:SI 0 r0) (reg:SI 134))) |
1105 | + |
1106 | + Thus the later passes can optimize them away. */ |
1107 | + && GET_CODE (PATTERN (insn)) != CLOBBER) |
1108 | { |
1109 | df_ref *use_rec; |
1110 | df_ref *def_rec; |
1111 | |
1112 | === modified file 'libstdc++-v3/libsupc++/eh_arm.cc' |
1113 | --- libstdc++-v3/libsupc++/eh_arm.cc 2009-04-09 14:00:19 +0000 |
1114 | +++ libstdc++-v3/libsupc++/eh_arm.cc 2011-04-13 13:28:18 +0000 |
1115 | @@ -30,10 +30,11 @@ |
1116 | using namespace __cxxabiv1; |
1117 | |
1118 | |
1119 | -// Given the thrown type THROW_TYPE, pointer to a variable containing a |
1120 | -// pointer to the exception object THROWN_PTR_P and a type CATCH_TYPE to |
1121 | -// compare against, return whether or not there is a match and if so, |
1122 | -// update *THROWN_PTR_P. |
1123 | +// Given the thrown type THROW_TYPE, exception object UE_HEADER and a |
1124 | +// type CATCH_TYPE to compare against, return whether or not there is |
1125 | +// a match and if so, update *THROWN_PTR_P to point to either the |
1126 | +// type-matched object, or in the case of a pointer type, the object |
1127 | +// pointed to by the pointer. |
1128 | |
1129 | extern "C" __cxa_type_match_result |
1130 | __cxa_type_match(_Unwind_Exception* ue_header, |
1131 | @@ -41,51 +42,51 @@ |
1132 | bool is_reference __attribute__((__unused__)), |
1133 | void** thrown_ptr_p) |
1134 | { |
1135 | - bool forced_unwind = __is_gxx_forced_unwind_class(ue_header->exception_class); |
1136 | - bool foreign_exception = !forced_unwind && !__is_gxx_exception_class(ue_header->exception_class); |
1137 | - bool dependent_exception = |
1138 | - __is_dependent_exception(ue_header->exception_class); |
1139 | + bool forced_unwind |
1140 | + = __is_gxx_forced_unwind_class(ue_header->exception_class); |
1141 | + bool foreign_exception |
1142 | + = !forced_unwind && !__is_gxx_exception_class(ue_header->exception_class); |
1143 | + bool dependent_exception |
1144 | + = __is_dependent_exception(ue_header->exception_class); |
1145 | __cxa_exception* xh = __get_exception_header_from_ue(ue_header); |
1146 | __cxa_dependent_exception *dx = __get_dependent_exception_from_ue(ue_header); |
1147 | const std::type_info* throw_type; |
1148 | + void *thrown_ptr = 0; |
1149 | |
1150 | if (forced_unwind) |
1151 | throw_type = &typeid(abi::__forced_unwind); |
1152 | else if (foreign_exception) |
1153 | throw_type = &typeid(abi::__foreign_exception); |
1154 | - else if (dependent_exception) |
1155 | - throw_type = __get_exception_header_from_obj |
1156 | - (dx->primaryException)->exceptionType; |
1157 | else |
1158 | - throw_type = xh->exceptionType; |
1159 | - |
1160 | - void* thrown_ptr = *thrown_ptr_p; |
1161 | + { |
1162 | + if (dependent_exception) |
1163 | + xh = __get_exception_header_from_obj (dx->primaryException); |
1164 | + throw_type = xh->exceptionType; |
1165 | + // We used to require the caller set the target of thrown_ptr_p, |
1166 | + // but that's incorrect -- the EHABI makes no such requirement |
1167 | + // -- and not all callers will set it. Fortunately callers that |
1168 | + // do initialize will always pass us the value we calculate |
1169 | + // here, so there's no backwards compatibility problem. |
1170 | + thrown_ptr = __get_object_from_ue (ue_header); |
1171 | + } |
1172 | + |
1173 | + __cxa_type_match_result result = ctm_succeeded; |
1174 | |
1175 | // Pointer types need to adjust the actual pointer, not |
1176 | // the pointer to pointer that is the exception object. |
1177 | // This also has the effect of passing pointer types |
1178 | // "by value" through the __cxa_begin_catch return value. |
1179 | if (throw_type->__is_pointer_p()) |
1180 | - thrown_ptr = *(void**) thrown_ptr; |
1181 | + { |
1182 | + thrown_ptr = *(void**) thrown_ptr; |
1183 | + // We need to indicate the indirection to our caller. |
1184 | + result = ctm_succeeded_with_ptr_to_base; |
1185 | + } |
1186 | |
1187 | if (catch_type->__do_catch(throw_type, &thrown_ptr, 1)) |
1188 | { |
1189 | *thrown_ptr_p = thrown_ptr; |
1190 | - |
1191 | - if (typeid(*catch_type) == typeid (typeid(void*))) |
1192 | - { |
1193 | - const __pointer_type_info *catch_pointer_type = |
1194 | - static_cast<const __pointer_type_info *> (catch_type); |
1195 | - const __pointer_type_info *throw_pointer_type = |
1196 | - static_cast<const __pointer_type_info *> (throw_type); |
1197 | - |
1198 | - if (typeid (*catch_pointer_type->__pointee) != typeid (void) |
1199 | - && (*catch_pointer_type->__pointee != |
1200 | - *throw_pointer_type->__pointee)) |
1201 | - return ctm_succeeded_with_ptr_to_base; |
1202 | - } |
1203 | - |
1204 | - return ctm_succeeded; |
1205 | + return result; |
1206 | } |
1207 | |
1208 | return ctm_failed; |
I am out of the office until 17/04/2011.
Note: This is an automated response to your message "[Merge]
lp:~ams-codesourcery/gcc-linaro/cs-merge-20110413 into lp:gcc-linaro"
sent on 13/4/2011 16:28:34.
This is the only notification you will receive while this person is away.