Merge lp:~eres/gcc-linaro/sms_doloop_for_ARM_new_version-4_6 into lp:gcc-linaro/4.6
- sms_doloop_for_ARM_new_version-4_6
- Merge into 4.6
Status: | Merged |
---|---|
Approved by: | Richard Sandiford |
Approved revision: | no longer in the source branch. |
Merged at revision: | 106747 |
Proposed branch: | lp:~eres/gcc-linaro/sms_doloop_for_ARM_new_version-4_6 |
Merge into: | lp:gcc-linaro/4.6 |
Diff against target: |
647 lines (+303/-75) 8 files modified
ChangeLog.linaro (+30/-0) gcc/config/arm/arm.md (+1/-1) gcc/config/arm/thumb2.md (+52/-1) gcc/ddg.c (+5/-0) gcc/doc/invoke.texi (+4/-0) gcc/loop-doloop.c (+67/-7) gcc/modulo-sched.c (+139/-66) gcc/params.def (+5/-0) |
To merge this branch: | bzr merge lp:~eres/gcc-linaro/sms_doloop_for_ARM_new_version-4_6 |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Linaro Toolchain Developers | Pending | ||
Review via email: mp+60601@code.launchpad.net |
This proposal supersedes a proposal from 2011-05-11.
Commit message
Description of the change
Implement doloop pattern for ARM and support it in SMS.
The change consists of these patches that sent upstream:
1) Doloop pattern for ARM:
http://
2) Support closing_
http://
3) New flag to apply SMS when SC equals 1
http://
Linaro Toolchain Builder (cbuild) wrote : | # |
Linaro Toolchain Builder (cbuild) wrote : | # |
cbuild successfully built this on armv7l-
The build results are available at:
http://
The test suite results changed compared to the branch point lp:gcc-linaro/4.6+bzr106744:
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
+PASS: gcc.c-torture/
The full testsuite results are at:
http://
cbuild-checked: armv7l-
Linaro Toolchain Builder (cbuild) wrote : | # |
cbuild successfully built this on i686-lucid-
The build results are available at:
http://
The test suite results were unchanged compared to the branch point lp:gcc-linaro/4.6+bzr106744.
The full testsuite results are at:
http://
cbuild-checked: i686-lucid-
Richard Sandiford (rsandifo) wrote : | # |
OK.
Preview Diff
1 | === modified file 'ChangeLog.linaro' |
2 | --- ChangeLog.linaro 2011-05-06 11:33:02 +0000 |
3 | +++ ChangeLog.linaro 2011-05-11 10:02:07 +0000 |
4 | @@ -1,3 +1,33 @@ |
5 | +2011-05-11 Revital Eres <revital.eres@linaro.org> |
6 | + |
7 | + gcc/ |
8 | + * loop-doloop.c (doloop_condition_get): Support new form of |
9 | + doloop pattern and use prev_nondebug_insn instead of PREV_INSN. |
10 | + * config/arm/thumb2.md (*thumb2_addsi3_compare0): Remove "*". |
11 | + (doloop_end): New. |
12 | + * config/arm/arm.md (*addsi3_compare0): Remove "*". |
13 | + * params.def (sms-min-sc): New param flag. |
14 | + * doc/invoke.texi (sms-min-sc): Document it. |
15 | + * ddg.c (create_ddg_dep_from_intra_loop_link): If a true dep edge |
16 | + enters the branch create an anti edge in the opposite direction |
17 | + to prevent the creation of reg-moves. |
18 | + * modulo-sched.c: Adjust comment to reflect the fact we are |
19 | + scheduling closing branch. |
20 | + (PS_STAGE_COUNT): Rename to CALC_STAGE_COUNT and redefine. |
21 | + (stage_count): New field in struct partial_schedule. |
22 | + (calculate_stage_count): New function. |
23 | + (normalize_sched_times): Rename to reset_sched_times and handle |
24 | + incrementing the sched time of the nodes by a constant value |
25 | + passed as parameter. |
26 | + (duplicate_insns_of_cycles): Skip closing branch. |
27 | + (sms_schedule_by_order): Schedule closing branch. |
28 | + (ps_insn_find_column): Handle closing branch. |
29 | + (sms_schedule): Call reset_sched_times and adjust the code to |
30 | + support scheduling of the closing branch. Use sms-min-sc. |
31 | + Support new form of doloop pattern. |
32 | + (ps_insert_empty_row): Update calls to normalize_sched_times |
33 | + and rotate_partial_schedule functions. |
34 | + |
35 | 2011-05-06 Richard Sandiford <richard.sandiford@linaro.org> |
36 | |
37 | gcc/ |
38 | |
39 | === modified file 'gcc/config/arm/arm.md' |
40 | --- gcc/config/arm/arm.md 2011-04-13 14:06:16 +0000 |
41 | +++ gcc/config/arm/arm.md 2011-05-11 10:02:07 +0000 |
42 | @@ -791,7 +791,7 @@ |
43 | "" |
44 | ) |
45 | |
46 | -(define_insn "*addsi3_compare0" |
47 | +(define_insn "addsi3_compare0" |
48 | [(set (reg:CC_NOOV CC_REGNUM) |
49 | (compare:CC_NOOV |
50 | (plus:SI (match_operand:SI 1 "s_register_operand" "r, r") |
51 | |
52 | === modified file 'gcc/config/arm/thumb2.md' |
53 | --- gcc/config/arm/thumb2.md 2011-01-03 20:52:22 +0000 |
54 | +++ gcc/config/arm/thumb2.md 2011-05-11 10:02:07 +0000 |
55 | @@ -836,7 +836,7 @@ |
56 | "operands[4] = GEN_INT (- INTVAL (operands[2]));" |
57 | ) |
58 | |
59 | -(define_insn "*thumb2_addsi3_compare0" |
60 | +(define_insn "thumb2_addsi3_compare0" |
61 | [(set (reg:CC_NOOV CC_REGNUM) |
62 | (compare:CC_NOOV |
63 | (plus:SI (match_operand:SI 1 "s_register_operand" "l, 0, r") |
64 | @@ -1118,3 +1118,54 @@ |
65 | " |
66 | operands[2] = GEN_INT (32 - INTVAL (operands[2])); |
67 | ") |
68 | + |
69 | +;; Define the subtract-one-and-jump insns so loop.c |
70 | +;; knows what to generate. |
71 | +(define_expand "doloop_end" |
72 | + [(use (match_operand 0 "" "")) ; loop pseudo |
73 | + (use (match_operand 1 "" "")) ; iterations; zero if unknown |
74 | + (use (match_operand 2 "" "")) ; max iterations |
75 | + (use (match_operand 3 "" "")) ; loop level |
76 | + (use (match_operand 4 "" ""))] ; label |
77 | + "TARGET_32BIT" |
78 | + " |
79 | + { |
80 | + /* Currently SMS relies on the do-loop pattern to recognize loops |
81 | + where (1) the control part consists of all insns defining and/or |
82 | + using a certain 'count' register and (2) the loop count can be |
83 | + adjusted by modifying this register prior to the loop. |
84 | + ??? The possible introduction of a new block to initialize the |
85 | + new IV can potentially affect branch optimizations. */ |
86 | + if (optimize > 0 && flag_modulo_sched) |
87 | + { |
88 | + rtx s0; |
89 | + rtx bcomp; |
90 | + rtx loc_ref; |
91 | + rtx cc_reg; |
92 | + rtx insn; |
93 | + rtx cmp; |
94 | + |
95 | + /* Only use this on innermost loops. */ |
96 | + if (INTVAL (operands[3]) > 1) |
97 | + FAIL; |
98 | + if (GET_MODE (operands[0]) != SImode) |
99 | + FAIL; |
100 | + |
101 | + s0 = operands [0]; |
102 | + if (TARGET_THUMB2) |
103 | + insn = emit_insn (gen_thumb2_addsi3_compare0 (s0, s0, GEN_INT (-1))); |
104 | + else |
105 | + insn = emit_insn (gen_addsi3_compare0 (s0, s0, GEN_INT (-1))); |
106 | + |
107 | + cmp = XVECEXP (PATTERN (insn), 0, 0); |
108 | + cc_reg = SET_DEST (cmp); |
109 | + bcomp = gen_rtx_NE (VOIDmode, cc_reg, const0_rtx); |
110 | + loc_ref = gen_rtx_LABEL_REF (VOIDmode, operands [4]); |
111 | + emit_jump_insn (gen_rtx_SET (VOIDmode, pc_rtx, |
112 | + gen_rtx_IF_THEN_ELSE (VOIDmode, bcomp, |
113 | + loc_ref, pc_rtx))); |
114 | + DONE; |
115 | + }else |
116 | + FAIL; |
117 | +}") |
118 | + |
119 | |
120 | === modified file 'gcc/ddg.c' |
121 | --- gcc/ddg.c 2010-11-30 11:41:24 +0000 |
122 | +++ gcc/ddg.c 2011-05-11 10:02:07 +0000 |
123 | @@ -197,6 +197,11 @@ |
124 | } |
125 | } |
126 | |
127 | + /* If a true dep edge enters the branch create an anti edge in the |
128 | + opposite direction to prevent the creation of reg-moves. */ |
129 | + if ((DEP_TYPE (link) == REG_DEP_TRUE) && JUMP_P (dest_node->insn)) |
130 | + create_ddg_dep_no_link (g, dest_node, src_node, ANTI_DEP, REG_DEP, 1); |
131 | + |
132 | latency = dep_cost (link); |
133 | e = create_ddg_edge (src_node, dest_node, t, dt, latency, distance); |
134 | add_edge_to_ddg (g, e); |
135 | |
136 | === modified file 'gcc/doc/invoke.texi' |
137 | --- gcc/doc/invoke.texi 2011-04-18 11:31:29 +0000 |
138 | +++ gcc/doc/invoke.texi 2011-05-11 10:02:07 +0000 |
139 | @@ -8730,6 +8730,10 @@ |
140 | The maximum number of best instructions in the ready list that are considered |
141 | for renaming in the selective scheduler. The default value is 2. |
142 | |
143 | +@item sms-min-sc |
144 | +The minimum value of stage count that swing modulo scheduler will |
145 | +generate. The default value is 2. |
146 | + |
147 | @item max-last-value-rtl |
148 | The maximum size measured as number of RTLs that can be recorded in an expression |
149 | in combiner for a pseudo register as last known value of that register. The default |
150 | |
151 | === modified file 'gcc/loop-doloop.c' |
152 | --- gcc/loop-doloop.c 2010-11-30 11:41:24 +0000 |
153 | +++ gcc/loop-doloop.c 2011-05-11 10:02:07 +0000 |
154 | @@ -78,6 +78,8 @@ |
155 | rtx inc_src; |
156 | rtx condition; |
157 | rtx pattern; |
158 | + rtx cc_reg = NULL_RTX; |
159 | + rtx reg_orig = NULL_RTX; |
160 | |
161 | /* The canonical doloop pattern we expect has one of the following |
162 | forms: |
163 | @@ -96,7 +98,16 @@ |
164 | 2) (set (reg) (plus (reg) (const_int -1)) |
165 | (set (pc) (if_then_else (reg != 0) |
166 | (label_ref (label)) |
167 | - (pc))). */ |
168 | + (pc))). |
169 | + |
170 | + Some targets (ARM) do the comparison before the branch, as in the |
171 | + following form: |
172 | + |
173 | + 3) (parallel [(set (cc) (compare ((plus (reg) (const_int -1), 0))) |
174 | + (set (reg) (plus (reg) (const_int -1)))]) |
175 | + (set (pc) (if_then_else (cc == NE) |
176 | + (label_ref (label)) |
177 | + (pc))) */ |
178 | |
179 | pattern = PATTERN (doloop_pat); |
180 | |
181 | @@ -104,19 +115,47 @@ |
182 | { |
183 | rtx cond; |
184 | rtx prev_insn = prev_nondebug_insn (doloop_pat); |
185 | + rtx cmp_arg1, cmp_arg2; |
186 | + rtx cmp_orig; |
187 | |
188 | - /* We expect the decrement to immediately precede the branch. */ |
189 | + /* In case the pattern is not PARALLEL we expect two forms |
190 | + of doloop which are cases 2) and 3) above: in case 2) the |
191 | + decrement immediately precedes the branch, while in case 3) |
192 | + the compare and decrement instructions immediately precede |
193 | + the branch. */ |
194 | |
195 | if (prev_insn == NULL_RTX || !INSN_P (prev_insn)) |
196 | return 0; |
197 | |
198 | cmp = pattern; |
199 | - inc = PATTERN (PREV_INSN (doloop_pat)); |
200 | + if (GET_CODE (PATTERN (prev_insn)) == PARALLEL) |
201 | + { |
202 | + /* The third case: the compare and decrement instructions |
203 | + immediately precede the branch. */ |
204 | + cmp_orig = XVECEXP (PATTERN (prev_insn), 0, 0); |
205 | + if (GET_CODE (cmp_orig) != SET) |
206 | + return 0; |
207 | + if (GET_CODE (SET_SRC (cmp_orig)) != COMPARE) |
208 | + return 0; |
209 | + cmp_arg1 = XEXP (SET_SRC (cmp_orig), 0); |
210 | + cmp_arg2 = XEXP (SET_SRC (cmp_orig), 1); |
211 | + if (cmp_arg2 != const0_rtx |
212 | + || GET_CODE (cmp_arg1) != PLUS) |
213 | + return 0; |
214 | + reg_orig = XEXP (cmp_arg1, 0); |
215 | + if (XEXP (cmp_arg1, 1) != GEN_INT (-1) |
216 | + || !REG_P (reg_orig)) |
217 | + return 0; |
218 | + cc_reg = SET_DEST (cmp_orig); |
219 | + |
220 | + inc = XVECEXP (PATTERN (prev_insn), 0, 1); |
221 | + } |
222 | + else |
223 | + inc = PATTERN (prev_insn); |
224 | /* We expect the condition to be of the form (reg != 0) */ |
225 | cond = XEXP (SET_SRC (cmp), 0); |
226 | if (GET_CODE (cond) != NE || XEXP (cond, 1) != const0_rtx) |
227 | return 0; |
228 | - |
229 | } |
230 | else |
231 | { |
232 | @@ -162,11 +201,15 @@ |
233 | return 0; |
234 | |
235 | if ((XEXP (condition, 0) == reg) |
236 | + /* For the third case: */ |
237 | + || ((cc_reg != NULL_RTX) |
238 | + && (XEXP (condition, 0) == cc_reg) |
239 | + && (reg_orig == reg)) |
240 | || (GET_CODE (XEXP (condition, 0)) == PLUS |
241 | - && XEXP (XEXP (condition, 0), 0) == reg)) |
242 | + && XEXP (XEXP (condition, 0), 0) == reg)) |
243 | { |
244 | if (GET_CODE (pattern) != PARALLEL) |
245 | - /* The second form we expect: |
246 | + /* For the second form we expect: |
247 | |
248 | (set (reg) (plus (reg) (const_int -1)) |
249 | (set (pc) (if_then_else (reg != 0) |
250 | @@ -181,7 +224,24 @@ |
251 | (set (reg) (plus (reg) (const_int -1))) |
252 | (additional clobbers and uses)]) |
253 | |
254 | - So we return that form instead. |
255 | + For the third form we expect: |
256 | + |
257 | + (parallel [(set (cc) (compare ((plus (reg) (const_int -1)), 0)) |
258 | + (set (reg) (plus (reg) (const_int -1)))]) |
259 | + (set (pc) (if_then_else (cc == NE) |
260 | + (label_ref (label)) |
261 | + (pc))) |
262 | + |
263 | + which is equivalent to the following: |
264 | + |
265 | + (parallel [(set (cc) (compare (reg, 1)) |
266 | + (set (reg) (plus (reg) (const_int -1))) |
267 | + (set (pc) (if_then_else (NE == cc) |
268 | + (label_ref (label)) |
269 | + (pc))))]) |
270 | + |
271 | + So we return the second form instead for the two cases. |
272 | + |
273 | */ |
274 | condition = gen_rtx_fmt_ee (NE, VOIDmode, inc_src, const1_rtx); |
275 | |
276 | |
277 | === modified file 'gcc/modulo-sched.c' |
278 | --- gcc/modulo-sched.c 2011-02-14 17:59:10 +0000 |
279 | +++ gcc/modulo-sched.c 2011-05-11 10:02:07 +0000 |
280 | @@ -84,14 +84,13 @@ |
281 | II cycles (i.e. use register copies to prevent a def from overwriting |
282 | itself before reaching the use). |
283 | |
284 | - SMS works with countable loops (1) whose control part can be easily |
285 | - decoupled from the rest of the loop and (2) whose loop count can |
286 | - be easily adjusted. This is because we peel a constant number of |
287 | - iterations into a prologue and epilogue for which we want to avoid |
288 | - emitting the control part, and a kernel which is to iterate that |
289 | - constant number of iterations less than the original loop. So the |
290 | - control part should be a set of insns clearly identified and having |
291 | - its own iv, not otherwise used in the loop (at-least for now), which |
292 | + SMS works with countable loops whose loop count can be easily |
293 | + adjusted. This is because we peel a constant number of iterations |
294 | + into a prologue and epilogue for which we want to avoid emitting |
295 | + the control part, and a kernel which is to iterate that constant |
296 | + number of iterations less than the original loop. So the control |
297 | + part should be a set of insns clearly identified and having its |
298 | + own iv, not otherwise used in the loop (at-least for now), which |
299 | initializes a register before the loop to the number of iterations. |
300 | Currently SMS relies on the do-loop pattern to recognize such loops, |
301 | where (1) the control part comprises of all insns defining and/or |
302 | @@ -116,8 +115,10 @@ |
303 | |
304 | /* The number of different iterations the nodes in ps span, assuming |
305 | the stage boundaries are placed efficiently. */ |
306 | -#define PS_STAGE_COUNT(ps) ((PS_MAX_CYCLE (ps) - PS_MIN_CYCLE (ps) \ |
307 | - + 1 + (ps)->ii - 1) / (ps)->ii) |
308 | +#define CALC_STAGE_COUNT(max_cycle,min_cycle,ii) ((max_cycle - min_cycle \ |
309 | + + 1 + ii - 1) / ii) |
310 | +/* The stage count of ps. */ |
311 | +#define PS_STAGE_COUNT(ps) (((partial_schedule_ptr)(ps))->stage_count) |
312 | |
313 | /* A single instruction in the partial schedule. */ |
314 | struct ps_insn |
315 | @@ -155,6 +156,8 @@ |
316 | int max_cycle; |
317 | |
318 | ddg_ptr g; /* The DDG of the insns in the partial schedule. */ |
319 | + |
320 | + int stage_count; /* The stage count of the partial schedule. */ |
321 | }; |
322 | |
323 | /* We use this to record all the register replacements we do in |
324 | @@ -195,7 +198,7 @@ |
325 | rtx, rtx); |
326 | static void duplicate_insns_of_cycles (partial_schedule_ptr, |
327 | int, int, int, rtx); |
328 | - |
329 | +static int calculate_stage_count (partial_schedule_ptr ps); |
330 | #define SCHED_ASAP(x) (((node_sched_params_ptr)(x)->aux.info)->asap) |
331 | #define SCHED_TIME(x) (((node_sched_params_ptr)(x)->aux.info)->time) |
332 | #define SCHED_FIRST_REG_MOVE(x) \ |
333 | @@ -310,10 +313,10 @@ |
334 | either a single (parallel) branch-on-count or a (non-parallel) |
335 | branch immediately preceded by a single (decrement) insn. */ |
336 | first_insn_not_to_check = (GET_CODE (PATTERN (tail)) == PARALLEL ? tail |
337 | - : PREV_INSN (tail)); |
338 | + : prev_nondebug_insn (tail)); |
339 | |
340 | for (insn = head; insn != first_insn_not_to_check; insn = NEXT_INSN (insn)) |
341 | - if (reg_mentioned_p (reg, insn)) |
342 | + if (reg_mentioned_p (reg, insn) && !DEBUG_INSN_P (insn)) |
343 | { |
344 | if (dump_file) |
345 | { |
346 | @@ -569,13 +572,12 @@ |
347 | } |
348 | } |
349 | |
350 | -/* Bump the SCHED_TIMEs of all nodes to start from zero. Set the values |
351 | - of SCHED_ROW and SCHED_STAGE. */ |
352 | +/* Bump the SCHED_TIMEs of all nodes by AMOUNT. Set the values of |
353 | + SCHED_ROW and SCHED_STAGE. */ |
354 | static void |
355 | -normalize_sched_times (partial_schedule_ptr ps) |
356 | +reset_sched_times (partial_schedule_ptr ps, int amount) |
357 | { |
358 | int row; |
359 | - int amount = PS_MIN_CYCLE (ps); |
360 | int ii = ps->ii; |
361 | ps_insn_ptr crr_insn; |
362 | |
363 | @@ -584,19 +586,43 @@ |
364 | { |
365 | ddg_node_ptr u = crr_insn->node; |
366 | int normalized_time = SCHED_TIME (u) - amount; |
367 | + int new_min_cycle = PS_MIN_CYCLE (ps) - amount; |
368 | + int sc_until_cycle_zero, stage; |
369 | |
370 | - if (dump_file) |
371 | - fprintf (dump_file, "crr_insn->node=%d, crr_insn->cycle=%d,\ |
372 | - min_cycle=%d\n", crr_insn->node->cuid, SCHED_TIME |
373 | - (u), ps->min_cycle); |
374 | + if (dump_file) |
375 | + { |
376 | + /* Print the scheduling times after the rotation. */ |
377 | + fprintf (dump_file, "crr_insn->node=%d (insn id %d), " |
378 | + "crr_insn->cycle=%d, min_cycle=%d", crr_insn->node->cuid, |
379 | + INSN_UID (crr_insn->node->insn), SCHED_TIME (u), |
380 | + normalized_time); |
381 | + if (JUMP_P (crr_insn->node->insn)) |
382 | + fprintf (dump_file, " (branch)"); |
383 | + fprintf (dump_file, "\n"); |
384 | + } |
385 | + |
386 | gcc_assert (SCHED_TIME (u) >= ps->min_cycle); |
387 | gcc_assert (SCHED_TIME (u) <= ps->max_cycle); |
388 | SCHED_TIME (u) = normalized_time; |
389 | - SCHED_ROW (u) = normalized_time % ii; |
390 | - SCHED_STAGE (u) = normalized_time / ii; |
391 | + SCHED_ROW (u) = SMODULO (normalized_time, ii); |
392 | + |
393 | + /* The calculation of stage count is done adding the number |
394 | + of stages before cycle zero and after cycle zero. */ |
395 | + sc_until_cycle_zero = CALC_STAGE_COUNT (-1, new_min_cycle, ii); |
396 | + |
397 | + if (SCHED_TIME (u) < 0) |
398 | + { |
399 | + stage = CALC_STAGE_COUNT (-1, SCHED_TIME (u), ii); |
400 | + SCHED_STAGE (u) = sc_until_cycle_zero - stage; |
401 | + } |
402 | + else |
403 | + { |
404 | + stage = CALC_STAGE_COUNT (SCHED_TIME (u), 0, ii); |
405 | + SCHED_STAGE (u) = sc_until_cycle_zero + stage - 1; |
406 | + } |
407 | } |
408 | } |
409 | - |
410 | + |
411 | /* Set SCHED_COLUMN of each node according to its position in PS. */ |
412 | static void |
413 | set_columns_for_ps (partial_schedule_ptr ps) |
414 | @@ -646,9 +672,12 @@ |
415 | |
416 | /* Do not duplicate any insn which refers to count_reg as it |
417 | belongs to the control part. |
418 | + The closing branch is scheduled as well and thus should |
419 | + be ignored. |
420 | TODO: This should be done by analyzing the control part of |
421 | the loop. */ |
422 | - if (reg_mentioned_p (count_reg, u_node->insn)) |
423 | + if (reg_mentioned_p (count_reg, u_node->insn) |
424 | + || JUMP_P (ps_ij->node->insn)) |
425 | continue; |
426 | |
427 | if (for_prolog) |
428 | @@ -1009,9 +1038,11 @@ |
429 | continue; |
430 | } |
431 | |
432 | - /* Don't handle BBs with calls or barriers, or !single_set insns, |
433 | - or auto-increment insns (to avoid creating invalid reg-moves |
434 | - for the auto-increment insns). |
435 | + /* Don't handle BBs with calls or barriers or auto-increment insns |
436 | + (to avoid creating invalid reg-moves for the auto-increment insns), |
437 | + or !single_set with the exception of instructions that include |
438 | + count_reg---these instructions are part of the control part |
439 | + that do-loop recognizes. |
440 | ??? Should handle auto-increment insns. |
441 | ??? Should handle insns defining subregs. */ |
442 | for (insn = head; insn != NEXT_INSN (tail); insn = NEXT_INSN (insn)) |
443 | @@ -1021,7 +1052,8 @@ |
444 | if (CALL_P (insn) |
445 | || BARRIER_P (insn) |
446 | || (NONDEBUG_INSN_P (insn) && !JUMP_P (insn) |
447 | - && !single_set (insn) && GET_CODE (PATTERN (insn)) != USE) |
448 | + && !single_set (insn) && GET_CODE (PATTERN (insn)) != USE |
449 | + && !reg_mentioned_p (count_reg, insn)) |
450 | || (FIND_REG_INC_NOTE (insn, NULL_RTX) != 0) |
451 | || (INSN_P (insn) && (set = single_set (insn)) |
452 | && GET_CODE (SET_DEST (set)) == SUBREG)) |
453 | @@ -1049,7 +1081,11 @@ |
454 | continue; |
455 | } |
456 | |
457 | - if (! (g = create_ddg (bb, 0))) |
458 | + /* Always schedule the closing branch with the rest of the |
459 | + instructions. The branch is rotated to be in row ii-1 at the |
460 | + end of the scheduling procedure to make sure it's the last |
461 | + instruction in the iteration. */ |
462 | + if (! (g = create_ddg (bb, 1))) |
463 | { |
464 | if (dump_file) |
465 | fprintf (dump_file, "SMS create_ddg failed\n"); |
466 | @@ -1157,14 +1193,17 @@ |
467 | |
468 | ps = sms_schedule_by_order (g, mii, maxii, node_order); |
469 | |
470 | - if (ps){ |
471 | - stage_count = PS_STAGE_COUNT (ps); |
472 | - gcc_assert(stage_count >= 1); |
473 | - } |
474 | + if (ps) |
475 | + { |
476 | + stage_count = calculate_stage_count (ps); |
477 | + gcc_assert(stage_count >= 1); |
478 | + PS_STAGE_COUNT(ps) = stage_count; |
479 | + } |
480 | |
481 | - /* Stage count of 1 means that there is no interleaving between |
482 | - iterations, let the scheduling passes do the job. */ |
483 | - if (stage_count <= 1 |
484 | + /* The default value of PARAM_SMS_MIN_SC is 2 as stage count of |
485 | + 1 means that there is no interleaving between iterations thus |
486 | + we let the scheduling passes do the job in this case. */ |
487 | + if (stage_count < (unsigned) PARAM_VALUE (PARAM_SMS_MIN_SC) |
488 | || (count_init && (loop_count <= stage_count)) |
489 | || (flag_branch_probabilities && (trip_count <= stage_count))) |
490 | { |
491 | @@ -1182,32 +1221,24 @@ |
492 | else |
493 | { |
494 | struct undo_replace_buff_elem *reg_move_replaces; |
495 | - |
496 | - if (dump_file) |
497 | - { |
498 | + int amount = SCHED_TIME (g->closing_branch) + 1; |
499 | + |
500 | + /* Set the stage boundaries. The closing_branch was scheduled |
501 | + and should appear in the last (ii-1) row. */ |
502 | + reset_sched_times (ps, amount); |
503 | + rotate_partial_schedule (ps, amount); |
504 | + set_columns_for_ps (ps); |
505 | + |
506 | + canon_loop (loop); |
507 | + |
508 | + if (dump_file) |
509 | + { |
510 | fprintf (dump_file, |
511 | "SMS succeeded %d %d (with ii, sc)\n", ps->ii, |
512 | stage_count); |
513 | print_partial_schedule (ps, dump_file); |
514 | - fprintf (dump_file, |
515 | - "SMS Branch (%d) will later be scheduled at cycle %d.\n", |
516 | - g->closing_branch->cuid, PS_MIN_CYCLE (ps) - 1); |
517 | } |
518 | - |
519 | - /* Set the stage boundaries. If the DDG is built with closing_branch_deps, |
520 | - the closing_branch was scheduled and should appear in the last (ii-1) |
521 | - row. Otherwise, we are free to schedule the branch, and we let nodes |
522 | - that were scheduled at the first PS_MIN_CYCLE cycle appear in the first |
523 | - row; this should reduce stage_count to minimum. |
524 | - TODO: Revisit the issue of scheduling the insns of the |
525 | - control part relative to the branch when the control part |
526 | - has more than one insn. */ |
527 | - normalize_sched_times (ps); |
528 | - rotate_partial_schedule (ps, PS_MIN_CYCLE (ps)); |
529 | - set_columns_for_ps (ps); |
530 | - |
531 | - canon_loop (loop); |
532 | - |
533 | + |
534 | /* case the BCT count is not known , Do loop-versioning */ |
535 | if (count_reg && ! count_init) |
536 | { |
537 | @@ -1760,12 +1791,6 @@ |
538 | continue; |
539 | } |
540 | |
541 | - if (JUMP_P (insn)) /* Closing branch handled later. */ |
542 | - { |
543 | - RESET_BIT (tobe_scheduled, u); |
544 | - continue; |
545 | - } |
546 | - |
547 | if (TEST_BIT (sched_nodes, u)) |
548 | continue; |
549 | |
550 | @@ -1893,8 +1918,8 @@ |
551 | if (dump_file) |
552 | fprintf (dump_file, "split_row=%d\n", split_row); |
553 | |
554 | - normalize_sched_times (ps); |
555 | - rotate_partial_schedule (ps, ps->min_cycle); |
556 | + reset_sched_times (ps, PS_MIN_CYCLE (ps)); |
557 | + rotate_partial_schedule (ps, PS_MIN_CYCLE (ps)); |
558 | |
559 | rows_new = (ps_insn_ptr *) xcalloc (new_ii, sizeof (ps_insn_ptr)); |
560 | for (row = 0; row < split_row; row++) |
561 | @@ -2571,6 +2596,7 @@ |
562 | ps_insn_ptr next_ps_i; |
563 | ps_insn_ptr first_must_follow = NULL; |
564 | ps_insn_ptr last_must_precede = NULL; |
565 | + ps_insn_ptr last_in_row = NULL; |
566 | int row; |
567 | |
568 | if (! ps_i) |
569 | @@ -2597,8 +2623,37 @@ |
570 | else |
571 | last_must_precede = next_ps_i; |
572 | } |
573 | + /* The closing branch must be the last in the row. */ |
574 | + if (must_precede |
575 | + && TEST_BIT (must_precede, next_ps_i->node->cuid) |
576 | + && JUMP_P (next_ps_i->node->insn)) |
577 | + return false; |
578 | + |
579 | + last_in_row = next_ps_i; |
580 | } |
581 | |
582 | + /* The closing branch is scheduled as well. Make sure there is no |
583 | + dependent instruction after it as the branch should be the last |
584 | + instruction in the row. */ |
585 | + if (JUMP_P (ps_i->node->insn)) |
586 | + { |
587 | + if (first_must_follow) |
588 | + return false; |
589 | + if (last_in_row) |
590 | + { |
591 | + /* Make the branch the last in the row. New instructions |
592 | + will be inserted at the beginning of the row or after the |
593 | + last must_precede instruction thus the branch is guaranteed |
594 | + to remain the last instruction in the row. */ |
595 | + last_in_row->next_in_row = ps_i; |
596 | + ps_i->prev_in_row = last_in_row; |
597 | + ps_i->next_in_row = NULL; |
598 | + } |
599 | + else |
600 | + ps->rows[row] = ps_i; |
601 | + return true; |
602 | + } |
603 | + |
604 | /* Now insert the node after INSERT_AFTER_PSI. */ |
605 | |
606 | if (! last_must_precede) |
607 | @@ -2820,6 +2875,24 @@ |
608 | return ps_i; |
609 | } |
610 | |
611 | +/* Calculate the stage count of the partial schedule PS. The calculation |
612 | + takes into account the rotation to bring the closing branch to row |
613 | + ii-1. */ |
614 | +int |
615 | +calculate_stage_count (partial_schedule_ptr ps) |
616 | +{ |
617 | + int rotation_amount = (SCHED_TIME (ps->g->closing_branch)) + 1; |
618 | + int new_min_cycle = PS_MIN_CYCLE (ps) - rotation_amount; |
619 | + int new_max_cycle = PS_MAX_CYCLE (ps) - rotation_amount; |
620 | + int stage_count = CALC_STAGE_COUNT (-1, new_min_cycle, ps->ii); |
621 | + |
622 | + /* The calculation of stage count is done adding the number of stages |
623 | + before cycle zero and after cycle zero. */ |
624 | + stage_count += CALC_STAGE_COUNT (new_max_cycle, 0, ps->ii); |
625 | + |
626 | + return stage_count; |
627 | +} |
628 | + |
629 | /* Rotate the rows of PS such that insns scheduled at time |
630 | START_CYCLE will appear in row 0. Updates max/min_cycles. */ |
631 | void |
632 | |
633 | === modified file 'gcc/params.def' |
634 | --- gcc/params.def 2011-04-18 11:31:29 +0000 |
635 | +++ gcc/params.def 2011-05-11 10:02:07 +0000 |
636 | @@ -344,6 +344,11 @@ |
637 | "sms-max-ii-factor", |
638 | "A factor for tuning the upper bound that swing modulo scheduler uses for scheduling a loop", |
639 | 100, 0, 0) |
640 | +/* The minimum value of stage count that swing modulo scheduler will generate. */ |
641 | +DEFPARAM(PARAM_SMS_MIN_SC, |
642 | + "sms-min-sc", |
643 | + "The minimum value of stage count that swing modulo scheduler will generate.", |
644 | + 2, 1, 1) |
645 | DEFPARAM(PARAM_SMS_DFA_HISTORY, |
646 | "sms-dfa-history", |
647 | "The number of cycles the swing modulo scheduler considers when checking conflicts using DFA", |
cbuild has taken a snapshot of this branch at r106745 and queued it for build.
The snapshot is available at: ex.seabright. co.nz/snapshots /gcc-linaro- 4.6+bzr106745~ eres~sms_ doloop_ for_ARM_ new_version- 4_6.tar. xdelta3. xz
http://
and will be built on the following builders:
a9-builder i686 x86_64
You can track the build queue at: ex.seabright. co.nz/helpers/ scheduler
http://
cbuild-snapshot: gcc-linaro- 4.6+bzr106745~ eres~sms_ doloop_ for_ARM_ new_version- 4_6
cbuild-ancestor: lp:gcc-linaro/4.6+bzr106744
cbuild-state: check