Merge lp:~ramana/gcc-linaro/a5-a15-costs-backport-4.6 into lp:gcc-linaro/4.6
- a5-a15-costs-backport-4.6
- Merge into 4.6
Status: | Merged |
---|---|
Approved by: | Ulrich Weigand |
Approved revision: | no longer in the source branch. |
Merged at revision: | 106759 |
Proposed branch: | lp:~ramana/gcc-linaro/a5-a15-costs-backport-4.6 |
Merge into: | lp:gcc-linaro/4.6 |
Diff against target: |
562 lines (+251/-66) (has conflicts) 10 files modified
ChangeLog.linaro (+80/-0) gcc/config/arm/arm-cores.def (+16/-15) gcc/config/arm/arm-protos.h (+5/-0) gcc/config/arm/arm-tune.md (+1/-1) gcc/config/arm/arm.c (+108/-23) gcc/config/arm/arm.h (+15/-5) gcc/config/arm/arm.md (+23/-1) gcc/config/arm/thumb2.md (+0/-20) gcc/doc/invoke.texi (+2/-1) gcc/dojump.c (+1/-0) Text conflict in ChangeLog.linaro |
To merge this branch: | bzr merge lp:~ramana/gcc-linaro/a5-a15-costs-backport-4.6 |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Linaro Toolchain Developers | Pending | ||
Review via email: mp+64575@code.launchpad.net |
Commit message
Description of the change
Hi,
This contains a set of backports of cost models and A5 / A15 tuning that was committed recently to trunk. Some of the costs infrastructure would make the BRANCH_COST tuning and other cost tuning work a bit easier to backport from trunk for the A9 as well as allow us to add more parameters for the A9.
There is some R5 tuning also pulled back but that is a zero-cost backport since that is something that comes by default with the A15 div instruction work.
It would be useful to get this sanity tested once with the default --with-
cheers
Ramana
Linaro Toolchain Builder (cbuild) wrote : | # |
Linaro Toolchain Builder (cbuild) wrote : | # |
cbuild successfully built this on i686-lucid-
The build results are available at:
http://
The test suite results were unchanged compared to the branch point lp:gcc-linaro/4.6+bzr106756.
The full testsuite results are at:
http://
cbuild-checked: i686-lucid-
Linaro Toolchain Builder (cbuild) wrote : | # |
cbuild successfully built this on x86_64-
The build results are available at:
http://
The test suite results were unchanged compared to the branch point lp:gcc-linaro/4.6+bzr106756.
The full testsuite results are at:
http://
cbuild-checked: x86_64-
Linaro Toolchain Builder (cbuild) wrote : | # |
cbuild successfully built this on armv7l-
The build results are available at:
http://
The test suite results were unchanged compared to the branch point lp:gcc-linaro/4.6+bzr106756.
The full testsuite results are at:
http://
cbuild-checked: armv7l-
Ulrich Weigand (uweigand) wrote : | # |
The backport looks good to me. Given that there are no testsuite regressions either, this is OK.
Preview Diff
1 | === modified file 'ChangeLog.linaro' |
2 | --- ChangeLog.linaro 2011-06-14 14:09:57 +0000 |
3 | +++ ChangeLog.linaro 2011-06-14 16:54:35 +0000 |
4 | @@ -1,3 +1,4 @@ |
5 | +<<<<<<< TREE |
6 | 2011-06-14 Andrew Stubbs <ams@codesourcery.com> |
7 | |
8 | gcc/ |
9 | @@ -10,6 +11,85 @@ |
10 | gcc/ |
11 | * LINARO-VERSION: Update. |
12 | |
13 | +======= |
14 | +2011-06-14 Ramana Radhakrishnan <ramana.radhakrishnan@linaro.org> |
15 | + |
16 | + Backport from mainline. |
17 | + 2011-06-03 Julian Brown <julian@codesourcery.com> |
18 | + |
19 | + * config/arm/arm-cores.def (strongarm, strongarm110, strongarm1100) |
20 | + (strongarm1110): Use strongarm tuning. |
21 | + * config/arm/arm-protos.h (tune_params): Add max_insns_skipped |
22 | + field. |
23 | + * config/arm/arm.c (arm_strongarm_tune): New. |
24 | + (arm_slowmul_tune, arm_fastmul_tune, arm_xscale_tune, arm_9e_tune) |
25 | + (arm_v6t2_tune, arm_cortex_tune, arm_cortex_a5_tune) |
26 | + (arm_cortex_a9_tune, arm_fa726te_tune): Add max_insns_skipped field |
27 | + setting, using previous defaults or 1 for Cortex-A5. |
28 | + (arm_option_override): Set max_insns_skipped from current tuning. |
29 | + |
30 | +2011-06-14 Ramana Radhakrishnan <ramana.radhakrishnan@linaro.org> |
31 | + |
32 | + Backport from mainline. |
33 | + 2011-06-02 Julian Brown <julian@codesourcery.com> |
34 | + |
35 | + * config/arm/arm-cores.def (cortex-a5): Use cortex_a5 tuning. |
36 | + * config/arm/arm.c (arm_cortex_a5_branch_cost): New. |
37 | + (arm_cortex_a5_tune): New. |
38 | + |
39 | + 2011-06-02 Julian Brown <julian@codesourcery.com> |
40 | + |
41 | + * config/arm/arm-protos.h (tune_params): Add branch_cost hook. |
42 | + * config/arm/arm.c (arm_default_branch_cost): New. |
43 | + (arm_slowmul_tune, arm_fastmul_tune, arm_xscale_tune, arm_9e_tune) |
44 | + (arm_v6t2_tune, arm_cortex_tune, arm_cortex_a9_tune) |
45 | + (arm_fa726_tune): Set branch_cost field using |
46 | + arm_default_branch_cost. |
47 | + * config/arm/arm.h (BRANCH_COST): Use branch_cost hook from |
48 | + current_tune structure. |
49 | + * dojump.c (tm_p.h): Include file. |
50 | + |
51 | + 2011-06-02 Julian Brown <julian@codesourcery.com> |
52 | + |
53 | + * config/arm/arm-cores.def (arm1156t2-s, arm1156t2f-s): Use v6t2 |
54 | + tuning. |
55 | + (cortex-a5, cortex-a8, cortex-a15, cortex-r4, cortex-r4f, cortex-m4) |
56 | + (cortex-m3, cortex-m1, cortex-m0): Use cortex tuning. |
57 | + * config/arm/arm-protos.h (tune_params): Add prefer_constant_pool |
58 | + field. |
59 | + * config/arm/arm.c (arm_slowmul_tune, arm_fastmul_tune) |
60 | + (arm_xscale_tune, arm_9e_tune, arm_cortex_a9_tune) |
61 | + (arm_fa726te_tune): Add prefer_constant_pool setting. |
62 | + (arm_v6t2_tune, arm_cortex_tune): New. |
63 | + * config/arm/arm.h (TARGET_USE_MOVT): Make dependent on |
64 | + prefer_constant_pool setting. |
65 | + |
66 | +2011-06-14 Ramana Radhakrishnan <ramana.radhakrishnan@linaro.org> |
67 | + |
68 | + Backport from mainline |
69 | + 2011-06-01 Paul Brook <paul@cpodesourcery.com> |
70 | + |
71 | + * config/arm/arm-cores.def: Add cortex-r5. Add DIV flags to |
72 | + Cortex-A15. |
73 | + * config/arm/arm-tune.md: Regenerate. |
74 | + * config/arm/arm.c (FL_DIV): Rename... |
75 | + (FL_THUMB_DIV): ... to this. |
76 | + (FL_ARM_DIV): Define. |
77 | + (FL_FOR_ARCH7R, FL_FOR_ARCH7M): Use FL_THUMB_DIV. |
78 | + (arm_arch_hwdiv): Remove. |
79 | + (arm_arch_thumb_hwdiv, arm_arch_arm_hwdiv): New variables. |
80 | + (arm_issue_rate): Add cortexr5. |
81 | + * config/arm/arm.h (TARGET_CPU_CPP_BUILTINS): Set |
82 | + __ARM_ARCH_EXT_IDIV__. |
83 | + (TARGET_IDIV): Define. |
84 | + (arm_arch_hwdiv): Remove. |
85 | + (arm_arch_arm_hwdiv, arm_arch_thumb_hwdiv): New prototypes. |
86 | + * config/arm/arm.md (tune_cortexr4): Add cortexr5. |
87 | + (divsi3, udivsi3): New patterns. |
88 | + * config/arm/thumb2.md (divsi3, udivsi3): Remove. |
89 | + * doc/invoke.texi: Document ARM -mcpu=cortex-r5 |
90 | + |
91 | +>>>>>>> MERGE-SOURCE |
92 | 2011-06-13 Ramana Radhakrishnan <ramana.radhakrishnan@linaro.org> |
93 | |
94 | Backport from mainline: |
95 | |
96 | === modified file 'gcc/config/arm/arm-cores.def' |
97 | --- gcc/config/arm/arm-cores.def 2011-01-03 20:52:22 +0000 |
98 | +++ gcc/config/arm/arm-cores.def 2011-06-14 16:54:35 +0000 |
99 | @@ -70,10 +70,10 @@ |
100 | /* V4 Architecture Processors */ |
101 | ARM_CORE("arm8", arm8, 4, FL_MODE26 | FL_LDSCHED, fastmul) |
102 | ARM_CORE("arm810", arm810, 4, FL_MODE26 | FL_LDSCHED, fastmul) |
103 | -ARM_CORE("strongarm", strongarm, 4, FL_MODE26 | FL_LDSCHED | FL_STRONG, fastmul) |
104 | -ARM_CORE("strongarm110", strongarm110, 4, FL_MODE26 | FL_LDSCHED | FL_STRONG, fastmul) |
105 | -ARM_CORE("strongarm1100", strongarm1100, 4, FL_MODE26 | FL_LDSCHED | FL_STRONG, fastmul) |
106 | -ARM_CORE("strongarm1110", strongarm1110, 4, FL_MODE26 | FL_LDSCHED | FL_STRONG, fastmul) |
107 | +ARM_CORE("strongarm", strongarm, 4, FL_MODE26 | FL_LDSCHED | FL_STRONG, strongarm) |
108 | +ARM_CORE("strongarm110", strongarm110, 4, FL_MODE26 | FL_LDSCHED | FL_STRONG, strongarm) |
109 | +ARM_CORE("strongarm1100", strongarm1100, 4, FL_MODE26 | FL_LDSCHED | FL_STRONG, strongarm) |
110 | +ARM_CORE("strongarm1110", strongarm1110, 4, FL_MODE26 | FL_LDSCHED | FL_STRONG, strongarm) |
111 | ARM_CORE("fa526", fa526, 4, FL_LDSCHED, fastmul) |
112 | ARM_CORE("fa626", fa626, 4, FL_LDSCHED, fastmul) |
113 | |
114 | @@ -122,15 +122,16 @@ |
115 | ARM_CORE("arm1176jzf-s", arm1176jzfs, 6ZK, FL_LDSCHED | FL_VFPV2, 9e) |
116 | ARM_CORE("mpcorenovfp", mpcorenovfp, 6K, FL_LDSCHED, 9e) |
117 | ARM_CORE("mpcore", mpcore, 6K, FL_LDSCHED | FL_VFPV2, 9e) |
118 | -ARM_CORE("arm1156t2-s", arm1156t2s, 6T2, FL_LDSCHED, 9e) |
119 | -ARM_CORE("arm1156t2f-s", arm1156t2fs, 6T2, FL_LDSCHED | FL_VFPV2, 9e) |
120 | -ARM_CORE("cortex-a5", cortexa5, 7A, FL_LDSCHED, 9e) |
121 | -ARM_CORE("cortex-a8", cortexa8, 7A, FL_LDSCHED, 9e) |
122 | +ARM_CORE("arm1156t2-s", arm1156t2s, 6T2, FL_LDSCHED, v6t2) |
123 | +ARM_CORE("arm1156t2f-s", arm1156t2fs, 6T2, FL_LDSCHED | FL_VFPV2, v6t2) |
124 | +ARM_CORE("cortex-a5", cortexa5, 7A, FL_LDSCHED, cortex_a5) |
125 | +ARM_CORE("cortex-a8", cortexa8, 7A, FL_LDSCHED, cortex) |
126 | ARM_CORE("cortex-a9", cortexa9, 7A, FL_LDSCHED, cortex_a9) |
127 | -ARM_CORE("cortex-a15", cortexa15, 7A, FL_LDSCHED, 9e) |
128 | -ARM_CORE("cortex-r4", cortexr4, 7R, FL_LDSCHED, 9e) |
129 | -ARM_CORE("cortex-r4f", cortexr4f, 7R, FL_LDSCHED, 9e) |
130 | -ARM_CORE("cortex-m4", cortexm4, 7EM, FL_LDSCHED, 9e) |
131 | -ARM_CORE("cortex-m3", cortexm3, 7M, FL_LDSCHED, 9e) |
132 | -ARM_CORE("cortex-m1", cortexm1, 6M, FL_LDSCHED, 9e) |
133 | -ARM_CORE("cortex-m0", cortexm0, 6M, FL_LDSCHED, 9e) |
134 | +ARM_CORE("cortex-a15", cortexa15, 7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex) |
135 | +ARM_CORE("cortex-r4", cortexr4, 7R, FL_LDSCHED, cortex) |
136 | +ARM_CORE("cortex-r4f", cortexr4f, 7R, FL_LDSCHED, cortex) |
137 | +ARM_CORE("cortex-r5", cortexr5, 7R, FL_LDSCHED | FL_ARM_DIV, cortex) |
138 | +ARM_CORE("cortex-m4", cortexm4, 7EM, FL_LDSCHED, cortex) |
139 | +ARM_CORE("cortex-m3", cortexm3, 7M, FL_LDSCHED, cortex) |
140 | +ARM_CORE("cortex-m1", cortexm1, 6M, FL_LDSCHED, cortex) |
141 | +ARM_CORE("cortex-m0", cortexm0, 6M, FL_LDSCHED, cortex) |
142 | |
143 | === modified file 'gcc/config/arm/arm-protos.h' |
144 | --- gcc/config/arm/arm-protos.h 2011-05-03 15:17:25 +0000 |
145 | +++ gcc/config/arm/arm-protos.h 2011-06-14 16:54:35 +0000 |
146 | @@ -219,9 +219,14 @@ |
147 | bool (*rtx_costs) (rtx, RTX_CODE, RTX_CODE, int *, bool); |
148 | bool (*sched_adjust_cost) (rtx, rtx, rtx, int *); |
149 | int constant_limit; |
150 | + /* Maximum number of instructions to conditionalise in |
151 | + arm_final_prescan_insn. */ |
152 | + int max_insns_skipped; |
153 | int num_prefetch_slots; |
154 | int l1_cache_size; |
155 | int l1_cache_line_size; |
156 | + bool prefer_constant_pool; |
157 | + int (*branch_cost) (bool, bool); |
158 | }; |
159 | |
160 | extern const struct tune_params *current_tune; |
161 | |
162 | === modified file 'gcc/config/arm/arm-tune.md' |
163 | --- gcc/config/arm/arm-tune.md 2010-12-20 17:48:51 +0000 |
164 | +++ gcc/config/arm/arm-tune.md 2011-06-14 16:54:35 +0000 |
165 | @@ -1,5 +1,5 @@ |
166 | ;; -*- buffer-read-only: t -*- |
167 | ;; Generated automatically by gentune.sh from arm-cores.def |
168 | (define_attr "tune" |
169 | - "arm2,arm250,arm3,arm6,arm60,arm600,arm610,arm620,arm7,arm7d,arm7di,arm70,arm700,arm700i,arm710,arm720,arm710c,arm7100,arm7500,arm7500fe,arm7m,arm7dm,arm7dmi,arm8,arm810,strongarm,strongarm110,strongarm1100,strongarm1110,fa526,fa626,arm7tdmi,arm7tdmis,arm710t,arm720t,arm740t,arm9,arm9tdmi,arm920,arm920t,arm922t,arm940t,ep9312,arm10tdmi,arm1020t,arm9e,arm946es,arm966es,arm968es,arm10e,arm1020e,arm1022e,xscale,iwmmxt,iwmmxt2,fa606te,fa626te,fmp626,fa726te,arm926ejs,arm1026ejs,arm1136js,arm1136jfs,arm1176jzs,arm1176jzfs,mpcorenovfp,mpcore,arm1156t2s,arm1156t2fs,cortexa5,cortexa8,cortexa9,cortexa15,cortexr4,cortexr4f,cortexm4,cortexm3,cortexm1,cortexm0" |
170 | + "arm2,arm250,arm3,arm6,arm60,arm600,arm610,arm620,arm7,arm7d,arm7di,arm70,arm700,arm700i,arm710,arm720,arm710c,arm7100,arm7500,arm7500fe,arm7m,arm7dm,arm7dmi,arm8,arm810,strongarm,strongarm110,strongarm1100,strongarm1110,fa526,fa626,arm7tdmi,arm7tdmis,arm710t,arm720t,arm740t,arm9,arm9tdmi,arm920,arm920t,arm922t,arm940t,ep9312,arm10tdmi,arm1020t,arm9e,arm946es,arm966es,arm968es,arm10e,arm1020e,arm1022e,xscale,iwmmxt,iwmmxt2,fa606te,fa626te,fmp626,fa726te,arm926ejs,arm1026ejs,arm1136js,arm1136jfs,arm1176jzs,arm1176jzfs,mpcorenovfp,mpcore,arm1156t2s,arm1156t2fs,cortexa5,cortexa8,cortexa9,cortexa15,cortexr4,cortexr4f,cortexr5,cortexm4,cortexm3,cortexm1,cortexm0" |
171 | (const (symbol_ref "((enum attr_tune) arm_tune)"))) |
172 | |
173 | === modified file 'gcc/config/arm/arm.c' |
174 | --- gcc/config/arm/arm.c 2011-05-11 14:49:48 +0000 |
175 | +++ gcc/config/arm/arm.c 2011-06-14 16:54:35 +0000 |
176 | @@ -255,6 +255,8 @@ |
177 | static void arm_conditional_register_usage (void); |
178 | static reg_class_t arm_preferred_rename_class (reg_class_t rclass); |
179 | static unsigned int arm_autovectorize_vector_sizes (void); |
180 | +static int arm_default_branch_cost (bool, bool); |
181 | +static int arm_cortex_a5_branch_cost (bool, bool); |
182 | |
183 | |
184 | |
185 | /* Table of machine attributes. */ |
186 | @@ -672,12 +674,13 @@ |
187 | #define FL_THUMB2 (1 << 16) /* Thumb-2. */ |
188 | #define FL_NOTM (1 << 17) /* Instructions not present in the 'M' |
189 | profile. */ |
190 | -#define FL_DIV (1 << 18) /* Hardware divide. */ |
191 | +#define FL_THUMB_DIV (1 << 18) /* Hardware divide (Thumb mode). */ |
192 | #define FL_VFPV3 (1 << 19) /* Vector Floating Point V3. */ |
193 | #define FL_NEON (1 << 20) /* Neon instructions. */ |
194 | #define FL_ARCH7EM (1 << 21) /* Instructions present in the ARMv7E-M |
195 | architecture. */ |
196 | #define FL_ARCH7 (1 << 22) /* Architecture 7. */ |
197 | +#define FL_ARM_DIV (1 << 23) /* Hardware divide (ARM mode). */ |
198 | |
199 | #define FL_IWMMXT (1 << 29) /* XScale v2 or "Intel Wireless MMX technology". */ |
200 | |
201 | @@ -704,8 +707,8 @@ |
202 | #define FL_FOR_ARCH6M (FL_FOR_ARCH6 & ~FL_NOTM) |
203 | #define FL_FOR_ARCH7 ((FL_FOR_ARCH6T2 & ~FL_NOTM) | FL_ARCH7) |
204 | #define FL_FOR_ARCH7A (FL_FOR_ARCH7 | FL_NOTM | FL_ARCH6K) |
205 | -#define FL_FOR_ARCH7R (FL_FOR_ARCH7A | FL_DIV) |
206 | -#define FL_FOR_ARCH7M (FL_FOR_ARCH7 | FL_DIV) |
207 | +#define FL_FOR_ARCH7R (FL_FOR_ARCH7A | FL_THUMB_DIV) |
208 | +#define FL_FOR_ARCH7M (FL_FOR_ARCH7 | FL_THUMB_DIV) |
209 | #define FL_FOR_ARCH7EM (FL_FOR_ARCH7M | FL_ARCH7EM) |
210 | |
211 | /* The bits in this mask specify which |
212 | @@ -791,7 +794,8 @@ |
213 | int arm_arch_thumb2; |
214 | |
215 | /* Nonzero if chip supports integer division instruction. */ |
216 | -int arm_arch_hwdiv; |
217 | +int arm_arch_arm_hwdiv; |
218 | +int arm_arch_thumb_hwdiv; |
219 | |
220 | /* In case of a PRE_INC, POST_INC, PRE_DEC, POST_DEC memory reference, |
221 | we must report the mode of the memory reference from |
222 | @@ -864,48 +868,117 @@ |
223 | { |
224 | arm_slowmul_rtx_costs, |
225 | NULL, |
226 | - 3, |
227 | - ARM_PREFETCH_NOT_BENEFICIAL |
228 | + 3, /* Constant limit. */ |
229 | + 5, /* Max cond insns. */ |
230 | + ARM_PREFETCH_NOT_BENEFICIAL, |
231 | + true, /* Prefer constant pool. */ |
232 | + arm_default_branch_cost |
233 | }; |
234 | |
235 | const struct tune_params arm_fastmul_tune = |
236 | { |
237 | arm_fastmul_rtx_costs, |
238 | NULL, |
239 | - 1, |
240 | - ARM_PREFETCH_NOT_BENEFICIAL |
241 | + 1, /* Constant limit. */ |
242 | + 5, /* Max cond insns. */ |
243 | + ARM_PREFETCH_NOT_BENEFICIAL, |
244 | + true, /* Prefer constant pool. */ |
245 | + arm_default_branch_cost |
246 | +}; |
247 | + |
248 | +/* StrongARM has early execution of branches, so a sequence that is worth |
249 | + skipping is shorter. Set max_insns_skipped to a lower value. */ |
250 | + |
251 | +const struct tune_params arm_strongarm_tune = |
252 | +{ |
253 | + arm_fastmul_rtx_costs, |
254 | + NULL, |
255 | + 1, /* Constant limit. */ |
256 | + 3, /* Max cond insns. */ |
257 | + ARM_PREFETCH_NOT_BENEFICIAL, |
258 | + true, /* Prefer constant pool. */ |
259 | + arm_default_branch_cost |
260 | }; |
261 | |
262 | const struct tune_params arm_xscale_tune = |
263 | { |
264 | arm_xscale_rtx_costs, |
265 | xscale_sched_adjust_cost, |
266 | - 2, |
267 | - ARM_PREFETCH_NOT_BENEFICIAL |
268 | + 2, /* Constant limit. */ |
269 | + 3, /* Max cond insns. */ |
270 | + ARM_PREFETCH_NOT_BENEFICIAL, |
271 | + true, /* Prefer constant pool. */ |
272 | + arm_default_branch_cost |
273 | }; |
274 | |
275 | const struct tune_params arm_9e_tune = |
276 | { |
277 | arm_9e_rtx_costs, |
278 | NULL, |
279 | - 1, |
280 | - ARM_PREFETCH_NOT_BENEFICIAL |
281 | + 1, /* Constant limit. */ |
282 | + 5, /* Max cond insns. */ |
283 | + ARM_PREFETCH_NOT_BENEFICIAL, |
284 | + true, /* Prefer constant pool. */ |
285 | + arm_default_branch_cost |
286 | +}; |
287 | + |
288 | +const struct tune_params arm_v6t2_tune = |
289 | +{ |
290 | + arm_9e_rtx_costs, |
291 | + NULL, |
292 | + 1, /* Constant limit. */ |
293 | + 5, /* Max cond insns. */ |
294 | + ARM_PREFETCH_NOT_BENEFICIAL, |
295 | + false, /* Prefer constant pool. */ |
296 | + arm_default_branch_cost |
297 | +}; |
298 | + |
299 | +/* Generic Cortex tuning. Use more specific tunings if appropriate. */ |
300 | +const struct tune_params arm_cortex_tune = |
301 | +{ |
302 | + arm_9e_rtx_costs, |
303 | + NULL, |
304 | + 1, /* Constant limit. */ |
305 | + 5, /* Max cond insns. */ |
306 | + ARM_PREFETCH_NOT_BENEFICIAL, |
307 | + false, /* Prefer constant pool. */ |
308 | + arm_default_branch_cost |
309 | +}; |
310 | + |
311 | +/* Branches can be dual-issued on Cortex-A5, so conditional execution is |
312 | + less appealing. Set max_insns_skipped to a low value. */ |
313 | + |
314 | +const struct tune_params arm_cortex_a5_tune = |
315 | +{ |
316 | + arm_9e_rtx_costs, |
317 | + NULL, |
318 | + 1, /* Constant limit. */ |
319 | + 1, /* Max cond insns. */ |
320 | + ARM_PREFETCH_NOT_BENEFICIAL, |
321 | + false, /* Prefer constant pool. */ |
322 | + arm_cortex_a5_branch_cost |
323 | }; |
324 | |
325 | const struct tune_params arm_cortex_a9_tune = |
326 | { |
327 | arm_9e_rtx_costs, |
328 | cortex_a9_sched_adjust_cost, |
329 | - 1, |
330 | - ARM_PREFETCH_BENEFICIAL(4,32,32) |
331 | + 1, /* Constant limit. */ |
332 | + 5, /* Max cond insns. */ |
333 | + ARM_PREFETCH_BENEFICIAL(4,32,32), |
334 | + false, /* Prefer constant pool. */ |
335 | + arm_default_branch_cost |
336 | }; |
337 | |
338 | const struct tune_params arm_fa726te_tune = |
339 | { |
340 | arm_9e_rtx_costs, |
341 | fa726te_sched_adjust_cost, |
342 | - 1, |
343 | - ARM_PREFETCH_NOT_BENEFICIAL |
344 | + 1, /* Constant limit. */ |
345 | + 5, /* Max cond insns. */ |
346 | + ARM_PREFETCH_NOT_BENEFICIAL, |
347 | + true, /* Prefer constant pool. */ |
348 | + arm_default_branch_cost |
349 | }; |
350 | |
351 | |
352 | @@ -1711,7 +1784,8 @@ |
353 | arm_tune_wbuf = (tune_flags & FL_WBUF) != 0; |
354 | arm_tune_xscale = (tune_flags & FL_XSCALE) != 0; |
355 | arm_arch_iwmmxt = (insn_flags & FL_IWMMXT) != 0; |
356 | - arm_arch_hwdiv = (insn_flags & FL_DIV) != 0; |
357 | + arm_arch_thumb_hwdiv = (insn_flags & FL_THUMB_DIV) != 0; |
358 | + arm_arch_arm_hwdiv = (insn_flags & FL_ARM_DIV) != 0; |
359 | arm_tune_cortex_a9 = (arm_tune == cortexa9) != 0; |
360 | |
361 | /* If we are not using the default (ARM mode) section anchor offset |
362 | @@ -1991,12 +2065,7 @@ |
363 | max_insns_skipped = 6; |
364 | } |
365 | else |
366 | - { |
367 | - /* StrongARM has early execution of branches, so a sequence |
368 | - that is worth skipping is shorter. */ |
369 | - if (arm_tune_strongarm) |
370 | - max_insns_skipped = 3; |
371 | - } |
372 | + max_insns_skipped = current_tune->max_insns_skipped; |
373 | |
374 | /* Hot/Cold partitioning is not currently supported, since we can't |
375 | handle literal pool placement in that case. */ |
376 | @@ -8211,6 +8280,21 @@ |
377 | return cost; |
378 | } |
379 | |
380 | +static int |
381 | +arm_default_branch_cost (bool speed_p, bool predictable_p ATTRIBUTE_UNUSED) |
382 | +{ |
383 | + if (TARGET_32BIT) |
384 | + return (TARGET_THUMB2 && !speed_p) ? 1 : 4; |
385 | + else |
386 | + return (optimize > 0) ? 2 : 0; |
387 | +} |
388 | + |
389 | +static int |
390 | +arm_cortex_a5_branch_cost (bool speed_p, bool predictable_p) |
391 | +{ |
392 | + return speed_p ? 0 : arm_default_branch_cost (speed_p, predictable_p); |
393 | +} |
394 | + |
395 | static int fp_consts_inited = 0; |
396 | |
397 | /* Only zero is valid for VFP. Other values are also valid for FPA. */ |
398 | @@ -23123,6 +23207,7 @@ |
399 | { |
400 | case cortexr4: |
401 | case cortexr4f: |
402 | + case cortexr5: |
403 | case cortexa5: |
404 | case cortexa8: |
405 | case cortexa9: |
406 | |
407 | === modified file 'gcc/config/arm/arm.h' |
408 | --- gcc/config/arm/arm.h 2011-06-02 12:12:00 +0000 |
409 | +++ gcc/config/arm/arm.h 2011-06-14 16:54:35 +0000 |
410 | @@ -101,6 +101,8 @@ |
411 | builtin_define ("__ARM_PCS"); \ |
412 | builtin_define ("__ARM_EABI__"); \ |
413 | } \ |
414 | + if (TARGET_IDIV) \ |
415 | + builtin_define ("__ARM_ARCH_EXT_IDIV__"); \ |
416 | } while (0) |
417 | |
418 | /* The various ARM cores. */ |
419 | @@ -282,7 +284,8 @@ |
420 | (TARGET_32BIT && arm_arch6 && (arm_arch_notm || arm_arch7em)) |
421 | |
422 | /* Should MOVW/MOVT be used in preference to a constant pool. */ |
423 | -#define TARGET_USE_MOVT (arm_arch_thumb2 && !optimize_size) |
424 | +#define TARGET_USE_MOVT \ |
425 | + (arm_arch_thumb2 && !optimize_size && !current_tune->prefer_constant_pool) |
426 | |
427 | /* We could use unified syntax for arm mode, but for now we just use it |
428 | for Thumb-2. */ |
429 | @@ -303,6 +306,10 @@ |
430 | /* Nonzero if this chip supports ldrex{bhd} and strex{bhd}. */ |
431 | #define TARGET_HAVE_LDREXBHD ((arm_arch6k && TARGET_ARM) || arm_arch7) |
432 | |
433 | +/* Nonzero if integer division instructions supported. */ |
434 | +#define TARGET_IDIV ((TARGET_ARM && arm_arch_arm_hwdiv) \ |
435 | + || (TARGET_THUMB2 && arm_arch_thumb_hwdiv)) |
436 | + |
437 | /* True iff the full BPABI is being used. If TARGET_BPABI is true, |
438 | then TARGET_AAPCS_BASED must be true -- but the converse does not |
439 | hold. TARGET_BPABI implies the use of the BPABI runtime library, |
440 | @@ -487,8 +494,11 @@ |
441 | /* Nonzero if chip supports Thumb 2. */ |
442 | extern int arm_arch_thumb2; |
443 | |
444 | -/* Nonzero if chip supports integer division instruction. */ |
445 | -extern int arm_arch_hwdiv; |
446 | +/* Nonzero if chip supports integer division instruction in ARM mode. */ |
447 | +extern int arm_arch_arm_hwdiv; |
448 | + |
449 | +/* Nonzero if chip supports integer division instruction in Thumb mode. */ |
450 | +extern int arm_arch_thumb_hwdiv; |
451 | |
452 | #ifndef TARGET_DEFAULT |
453 | #define TARGET_DEFAULT (MASK_APCS_FRAME) |
454 | @@ -2018,8 +2028,8 @@ |
455 | /* Try to generate sequences that don't involve branches, we can then use |
456 | conditional instructions */ |
457 | #define BRANCH_COST(speed_p, predictable_p) \ |
458 | - (TARGET_32BIT ? (TARGET_THUMB2 && !speed_p ? 1 : 4) \ |
459 | - : (optimize > 0 ? 2 : 0)) |
460 | + (current_tune->branch_cost (speed_p, predictable_p)) |
461 | + |
462 | |
463 | |
464 | /* Position Independent Code. */ |
465 | /* We decide which register to use based on the compilation options and |
466 | |
467 | === modified file 'gcc/config/arm/arm.md' |
468 | --- gcc/config/arm/arm.md 2011-06-02 15:58:33 +0000 |
469 | +++ gcc/config/arm/arm.md 2011-06-14 16:54:35 +0000 |
470 | @@ -490,7 +490,7 @@ |
471 | |
472 | (define_attr "tune_cortexr4" "yes,no" |
473 | (const (if_then_else |
474 | - (eq_attr "tune" "cortexr4,cortexr4f") |
475 | + (eq_attr "tune" "cortexr4,cortexr4f,cortexr5") |
476 | (const_string "yes") |
477 | (const_string "no")))) |
478 | |
479 | @@ -3738,6 +3738,28 @@ |
480 | (set_attr "predicable" "yes")] |
481 | ) |
482 | |
483 | + |
484 | +;; Division instructions |
485 | +(define_insn "divsi3" |
486 | + [(set (match_operand:SI 0 "s_register_operand" "=r") |
487 | + (div:SI (match_operand:SI 1 "s_register_operand" "r") |
488 | + (match_operand:SI 2 "s_register_operand" "r")))] |
489 | + "TARGET_IDIV" |
490 | + "sdiv%?\t%0, %1, %2" |
491 | + [(set_attr "predicable" "yes") |
492 | + (set_attr "insn" "sdiv")] |
493 | +) |
494 | + |
495 | +(define_insn "udivsi3" |
496 | + [(set (match_operand:SI 0 "s_register_operand" "=r") |
497 | + (udiv:SI (match_operand:SI 1 "s_register_operand" "r") |
498 | + (match_operand:SI 2 "s_register_operand" "r")))] |
499 | + "TARGET_IDIV" |
500 | + "udiv%?\t%0, %1, %2" |
501 | + [(set_attr "predicable" "yes") |
502 | + (set_attr "insn" "udiv")] |
503 | +) |
504 | + |
505 | |
506 | |
507 | ;; Unary arithmetic insns |
508 | |
509 | |
510 | === modified file 'gcc/config/arm/thumb2.md' |
511 | --- gcc/config/arm/thumb2.md 2011-05-11 07:15:47 +0000 |
512 | +++ gcc/config/arm/thumb2.md 2011-06-14 16:54:35 +0000 |
513 | @@ -779,26 +779,6 @@ |
514 | (set_attr "length" "2")] |
515 | ) |
516 | |
517 | -(define_insn "divsi3" |
518 | - [(set (match_operand:SI 0 "s_register_operand" "=r") |
519 | - (div:SI (match_operand:SI 1 "s_register_operand" "r") |
520 | - (match_operand:SI 2 "s_register_operand" "r")))] |
521 | - "TARGET_THUMB2 && arm_arch_hwdiv" |
522 | - "sdiv%?\t%0, %1, %2" |
523 | - [(set_attr "predicable" "yes") |
524 | - (set_attr "insn" "sdiv")] |
525 | -) |
526 | - |
527 | -(define_insn "udivsi3" |
528 | - [(set (match_operand:SI 0 "s_register_operand" "=r") |
529 | - (udiv:SI (match_operand:SI 1 "s_register_operand" "r") |
530 | - (match_operand:SI 2 "s_register_operand" "r")))] |
531 | - "TARGET_THUMB2 && arm_arch_hwdiv" |
532 | - "udiv%?\t%0, %1, %2" |
533 | - [(set_attr "predicable" "yes") |
534 | - (set_attr "insn" "udiv")] |
535 | -) |
536 | - |
537 | (define_insn "*thumb2_subsi_short" |
538 | [(set (match_operand:SI 0 "low_register_operand" "=l") |
539 | (minus:SI (match_operand:SI 1 "low_register_operand" "l") |
540 | |
541 | === modified file 'gcc/doc/invoke.texi' |
542 | --- gcc/doc/invoke.texi 2011-05-11 07:15:47 +0000 |
543 | +++ gcc/doc/invoke.texi 2011-06-14 16:54:35 +0000 |
544 | @@ -10208,7 +10208,8 @@ |
545 | @samp{arm1136j-s}, @samp{arm1136jf-s}, @samp{mpcore}, @samp{mpcorenovfp}, |
546 | @samp{arm1156t2-s}, @samp{arm1156t2f-s}, @samp{arm1176jz-s}, @samp{arm1176jzf-s}, |
547 | @samp{cortex-a5}, @samp{cortex-a8}, @samp{cortex-a9}, @samp{cortex-a15}, |
548 | -@samp{cortex-r4}, @samp{cortex-r4f}, @samp{cortex-m4}, @samp{cortex-m3}, |
549 | +@samp{cortex-r4}, @samp{cortex-r4f}, @samp{cortex-r5}, |
550 | +@samp{cortex-m4}, @samp{cortex-m3}, |
551 | @samp{cortex-m1}, |
552 | @samp{cortex-m0}, |
553 | @samp{xscale}, @samp{iwmmxt}, @samp{iwmmxt2}, @samp{ep9312}. |
554 | |
555 | === modified file 'gcc/dojump.c' |
556 | --- gcc/dojump.c 2010-05-19 19:09:57 +0000 |
557 | +++ gcc/dojump.c 2011-06-14 16:54:35 +0000 |
558 | @@ -36,6 +36,7 @@ |
559 | #include "ggc.h" |
560 | #include "basic-block.h" |
561 | #include "output.h" |
562 | +#include "tm_p.h" |
563 | |
564 | static bool prefer_and_bit_test (enum machine_mode, int); |
565 | static void do_jump_by_parts_greater (tree, tree, int, rtx, rtx, int); |
cbuild has taken a snapshot of this branch at r106759 and queued it for build.
The snapshot is available at: ex.seabright. co.nz/snapshots /gcc-linaro- 4.6+bzr106759~ ramana~ a5-a15- costs-backport- 4.6.tar. xdelta3. xz
http://
and will be built on the following builders:
a9-builder i686 x86_64
You can track the build queue at: ex.seabright. co.nz/helpers/ scheduler
http://
cbuild-snapshot: gcc-linaro- 4.6+bzr106759~ ramana~ a5-a15- costs-backport- 4.6
cbuild-ancestor: lp:gcc-linaro/4.6+bzr106756
cbuild-state: check