Merge lp:~eres/gcc-linaro/sms_doloop_for_ARM_new_version-4_6 into lp:gcc-linaro/4.6

Proposed by Revital Eres
Status: Merged
Approved by: Richard Sandiford
Approved revision: no longer in the source branch.
Merged at revision: 106747
Proposed branch: lp:~eres/gcc-linaro/sms_doloop_for_ARM_new_version-4_6
Merge into: lp:gcc-linaro/4.6
Diff against target: 647 lines (+303/-75)
8 files modified
ChangeLog.linaro (+30/-0)
gcc/config/arm/arm.md (+1/-1)
gcc/config/arm/thumb2.md (+52/-1)
gcc/ddg.c (+5/-0)
gcc/doc/invoke.texi (+4/-0)
gcc/loop-doloop.c (+67/-7)
gcc/modulo-sched.c (+139/-66)
gcc/params.def (+5/-0)
To merge this branch: bzr merge lp:~eres/gcc-linaro/sms_doloop_for_ARM_new_version-4_6
Reviewer Review Type Date Requested Status
Linaro Toolchain Developers Pending
Review via email: mp+60601@code.launchpad.net

This proposal supersedes a proposal from 2011-05-11.

Description of the change

Implement doloop pattern for ARM and support it in SMS.
The change consists of these patches that sent upstream:
1) Doloop pattern for ARM:
http://gcc.gnu.org/ml/gcc-patches/2011-01/msg01718.html (approved for trunk)
2) Support closing_branch_deps:
 http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00592.html (approved for trunk)
3) New flag to apply SMS when SC equals 1
http://gcc.gnu.org/ml/gcc-patches/2011-04/msg01304.html (approved for trunk)

To post a comment you must log in.
Revision history for this message
Linaro Toolchain Builder (cbuild) wrote :

cbuild has taken a snapshot of this branch at r106745 and queued it for build.

The snapshot is available at:
 http://ex.seabright.co.nz/snapshots/gcc-linaro-4.6+bzr106745~eres~sms_doloop_for_ARM_new_version-4_6.tar.xdelta3.xz

and will be built on the following builders:
 a9-builder i686 x86_64

You can track the build queue at:
 http://ex.seabright.co.nz/helpers/scheduler

cbuild-snapshot: gcc-linaro-4.6+bzr106745~eres~sms_doloop_for_ARM_new_version-4_6
cbuild-ancestor: lp:gcc-linaro/4.6+bzr106744
cbuild-state: check

Revision history for this message
Linaro Toolchain Builder (cbuild) wrote :

cbuild successfully built this on armv7l-maverick-cbuild114-ursa4-cortexa9r1.

The build results are available at:
 http://ex.seabright.co.nz/build/gcc-linaro-4.6+bzr106745~eres~sms_doloop_for_ARM_new_version-4_6/logs/armv7l-maverick-cbuild114-ursa4-cortexa9r1

The test suite results changed compared to the branch point lp:gcc-linaro/4.6+bzr106744:
 +PASS: gcc.c-torture/unsorted/dump-noaddr.c.169r.loop2_doloop, -O1 comparison
 +PASS: gcc.c-torture/unsorted/dump-noaddr.c.169r.loop2_doloop, -O2 comparison
 +PASS: gcc.c-torture/unsorted/dump-noaddr.c.169r.loop2_doloop, -O2 -flto comparison
 +PASS: gcc.c-torture/unsorted/dump-noaddr.c.169r.loop2_doloop, -O2 -flto -flto-partition=none comparison
 +PASS: gcc.c-torture/unsorted/dump-noaddr.c.169r.loop2_doloop, -O3 -fomit-frame-pointer comparison
 +PASS: gcc.c-torture/unsorted/dump-noaddr.c.169r.loop2_doloop, -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions comparison
 +PASS: gcc.c-torture/unsorted/dump-noaddr.c.169r.loop2_doloop, -O3 -fomit-frame-pointer -funroll-loops comparison
 +PASS: gcc.c-torture/unsorted/dump-noaddr.c.169r.loop2_doloop, -O3 -g comparison
 +PASS: gcc.c-torture/unsorted/dump-noaddr.c.169r.loop2_doloop, -Os comparison

The full testsuite results are at:
 http://ex.seabright.co.nz/build/gcc-linaro-4.6+bzr106745~eres~sms_doloop_for_ARM_new_version-4_6/logs/armv7l-maverick-cbuild114-ursa4-cortexa9r1/gcc-testsuite.txt

cbuild-checked: armv7l-maverick-cbuild114-ursa4-cortexa9r1

Revision history for this message
Linaro Toolchain Builder (cbuild) wrote :

cbuild successfully built this on i686-lucid-cbuild114-scorpius-i686r1.

The build results are available at:
 http://ex.seabright.co.nz/build/gcc-linaro-4.6+bzr106745~eres~sms_doloop_for_ARM_new_version-4_6/logs/i686-lucid-cbuild114-scorpius-i686r1

The test suite results were unchanged compared to the branch point lp:gcc-linaro/4.6+bzr106744.

The full testsuite results are at:
 http://ex.seabright.co.nz/build/gcc-linaro-4.6+bzr106745~eres~sms_doloop_for_ARM_new_version-4_6/logs/i686-lucid-cbuild114-scorpius-i686r1/gcc-testsuite.txt

cbuild-checked: i686-lucid-cbuild114-scorpius-i686r1

Revision history for this message
Richard Sandiford (rsandifo) wrote :

OK.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'ChangeLog.linaro'
2--- ChangeLog.linaro 2011-05-06 11:33:02 +0000
3+++ ChangeLog.linaro 2011-05-11 10:02:07 +0000
4@@ -1,3 +1,33 @@
5+2011-05-11 Revital Eres <revital.eres@linaro.org>
6+
7+ gcc/
8+ * loop-doloop.c (doloop_condition_get): Support new form of
9+ doloop pattern and use prev_nondebug_insn instead of PREV_INSN.
10+ * config/arm/thumb2.md (*thumb2_addsi3_compare0): Remove "*".
11+ (doloop_end): New.
12+ * config/arm/arm.md (*addsi3_compare0): Remove "*".
13+ * params.def (sms-min-sc): New param flag.
14+ * doc/invoke.texi (sms-min-sc): Document it.
15+ * ddg.c (create_ddg_dep_from_intra_loop_link): If a true dep edge
16+ enters the branch create an anti edge in the opposite direction
17+ to prevent the creation of reg-moves.
18+ * modulo-sched.c: Adjust comment to reflect the fact we are
19+ scheduling closing branch.
20+ (PS_STAGE_COUNT): Rename to CALC_STAGE_COUNT and redefine.
21+ (stage_count): New field in struct partial_schedule.
22+ (calculate_stage_count): New function.
23+ (normalize_sched_times): Rename to reset_sched_times and handle
24+ incrementing the sched time of the nodes by a constant value
25+ passed as parameter.
26+ (duplicate_insns_of_cycles): Skip closing branch.
27+ (sms_schedule_by_order): Schedule closing branch.
28+ (ps_insn_find_column): Handle closing branch.
29+ (sms_schedule): Call reset_sched_times and adjust the code to
30+ support scheduling of the closing branch. Use sms-min-sc.
31+ Support new form of doloop pattern.
32+ (ps_insert_empty_row): Update calls to normalize_sched_times
33+ and rotate_partial_schedule functions.
34+
35 2011-05-06 Richard Sandiford <richard.sandiford@linaro.org>
36
37 gcc/
38
39=== modified file 'gcc/config/arm/arm.md'
40--- gcc/config/arm/arm.md 2011-04-13 14:06:16 +0000
41+++ gcc/config/arm/arm.md 2011-05-11 10:02:07 +0000
42@@ -791,7 +791,7 @@
43 ""
44 )
45
46-(define_insn "*addsi3_compare0"
47+(define_insn "addsi3_compare0"
48 [(set (reg:CC_NOOV CC_REGNUM)
49 (compare:CC_NOOV
50 (plus:SI (match_operand:SI 1 "s_register_operand" "r, r")
51
52=== modified file 'gcc/config/arm/thumb2.md'
53--- gcc/config/arm/thumb2.md 2011-01-03 20:52:22 +0000
54+++ gcc/config/arm/thumb2.md 2011-05-11 10:02:07 +0000
55@@ -836,7 +836,7 @@
56 "operands[4] = GEN_INT (- INTVAL (operands[2]));"
57 )
58
59-(define_insn "*thumb2_addsi3_compare0"
60+(define_insn "thumb2_addsi3_compare0"
61 [(set (reg:CC_NOOV CC_REGNUM)
62 (compare:CC_NOOV
63 (plus:SI (match_operand:SI 1 "s_register_operand" "l, 0, r")
64@@ -1118,3 +1118,54 @@
65 "
66 operands[2] = GEN_INT (32 - INTVAL (operands[2]));
67 ")
68+
69+;; Define the subtract-one-and-jump insns so loop.c
70+;; knows what to generate.
71+(define_expand "doloop_end"
72+ [(use (match_operand 0 "" "")) ; loop pseudo
73+ (use (match_operand 1 "" "")) ; iterations; zero if unknown
74+ (use (match_operand 2 "" "")) ; max iterations
75+ (use (match_operand 3 "" "")) ; loop level
76+ (use (match_operand 4 "" ""))] ; label
77+ "TARGET_32BIT"
78+ "
79+ {
80+ /* Currently SMS relies on the do-loop pattern to recognize loops
81+ where (1) the control part consists of all insns defining and/or
82+ using a certain 'count' register and (2) the loop count can be
83+ adjusted by modifying this register prior to the loop.
84+ ??? The possible introduction of a new block to initialize the
85+ new IV can potentially affect branch optimizations. */
86+ if (optimize > 0 && flag_modulo_sched)
87+ {
88+ rtx s0;
89+ rtx bcomp;
90+ rtx loc_ref;
91+ rtx cc_reg;
92+ rtx insn;
93+ rtx cmp;
94+
95+ /* Only use this on innermost loops. */
96+ if (INTVAL (operands[3]) > 1)
97+ FAIL;
98+ if (GET_MODE (operands[0]) != SImode)
99+ FAIL;
100+
101+ s0 = operands [0];
102+ if (TARGET_THUMB2)
103+ insn = emit_insn (gen_thumb2_addsi3_compare0 (s0, s0, GEN_INT (-1)));
104+ else
105+ insn = emit_insn (gen_addsi3_compare0 (s0, s0, GEN_INT (-1)));
106+
107+ cmp = XVECEXP (PATTERN (insn), 0, 0);
108+ cc_reg = SET_DEST (cmp);
109+ bcomp = gen_rtx_NE (VOIDmode, cc_reg, const0_rtx);
110+ loc_ref = gen_rtx_LABEL_REF (VOIDmode, operands [4]);
111+ emit_jump_insn (gen_rtx_SET (VOIDmode, pc_rtx,
112+ gen_rtx_IF_THEN_ELSE (VOIDmode, bcomp,
113+ loc_ref, pc_rtx)));
114+ DONE;
115+ }else
116+ FAIL;
117+}")
118+
119
120=== modified file 'gcc/ddg.c'
121--- gcc/ddg.c 2010-11-30 11:41:24 +0000
122+++ gcc/ddg.c 2011-05-11 10:02:07 +0000
123@@ -197,6 +197,11 @@
124 }
125 }
126
127+ /* If a true dep edge enters the branch create an anti edge in the
128+ opposite direction to prevent the creation of reg-moves. */
129+ if ((DEP_TYPE (link) == REG_DEP_TRUE) && JUMP_P (dest_node->insn))
130+ create_ddg_dep_no_link (g, dest_node, src_node, ANTI_DEP, REG_DEP, 1);
131+
132 latency = dep_cost (link);
133 e = create_ddg_edge (src_node, dest_node, t, dt, latency, distance);
134 add_edge_to_ddg (g, e);
135
136=== modified file 'gcc/doc/invoke.texi'
137--- gcc/doc/invoke.texi 2011-04-18 11:31:29 +0000
138+++ gcc/doc/invoke.texi 2011-05-11 10:02:07 +0000
139@@ -8730,6 +8730,10 @@
140 The maximum number of best instructions in the ready list that are considered
141 for renaming in the selective scheduler. The default value is 2.
142
143+@item sms-min-sc
144+The minimum value of stage count that swing modulo scheduler will
145+generate. The default value is 2.
146+
147 @item max-last-value-rtl
148 The maximum size measured as number of RTLs that can be recorded in an expression
149 in combiner for a pseudo register as last known value of that register. The default
150
151=== modified file 'gcc/loop-doloop.c'
152--- gcc/loop-doloop.c 2010-11-30 11:41:24 +0000
153+++ gcc/loop-doloop.c 2011-05-11 10:02:07 +0000
154@@ -78,6 +78,8 @@
155 rtx inc_src;
156 rtx condition;
157 rtx pattern;
158+ rtx cc_reg = NULL_RTX;
159+ rtx reg_orig = NULL_RTX;
160
161 /* The canonical doloop pattern we expect has one of the following
162 forms:
163@@ -96,7 +98,16 @@
164 2) (set (reg) (plus (reg) (const_int -1))
165 (set (pc) (if_then_else (reg != 0)
166 (label_ref (label))
167- (pc))). */
168+ (pc))).
169+
170+ Some targets (ARM) do the comparison before the branch, as in the
171+ following form:
172+
173+ 3) (parallel [(set (cc) (compare ((plus (reg) (const_int -1), 0)))
174+ (set (reg) (plus (reg) (const_int -1)))])
175+ (set (pc) (if_then_else (cc == NE)
176+ (label_ref (label))
177+ (pc))) */
178
179 pattern = PATTERN (doloop_pat);
180
181@@ -104,19 +115,47 @@
182 {
183 rtx cond;
184 rtx prev_insn = prev_nondebug_insn (doloop_pat);
185+ rtx cmp_arg1, cmp_arg2;
186+ rtx cmp_orig;
187
188- /* We expect the decrement to immediately precede the branch. */
189+ /* In case the pattern is not PARALLEL we expect two forms
190+ of doloop which are cases 2) and 3) above: in case 2) the
191+ decrement immediately precedes the branch, while in case 3)
192+ the compare and decrement instructions immediately precede
193+ the branch. */
194
195 if (prev_insn == NULL_RTX || !INSN_P (prev_insn))
196 return 0;
197
198 cmp = pattern;
199- inc = PATTERN (PREV_INSN (doloop_pat));
200+ if (GET_CODE (PATTERN (prev_insn)) == PARALLEL)
201+ {
202+ /* The third case: the compare and decrement instructions
203+ immediately precede the branch. */
204+ cmp_orig = XVECEXP (PATTERN (prev_insn), 0, 0);
205+ if (GET_CODE (cmp_orig) != SET)
206+ return 0;
207+ if (GET_CODE (SET_SRC (cmp_orig)) != COMPARE)
208+ return 0;
209+ cmp_arg1 = XEXP (SET_SRC (cmp_orig), 0);
210+ cmp_arg2 = XEXP (SET_SRC (cmp_orig), 1);
211+ if (cmp_arg2 != const0_rtx
212+ || GET_CODE (cmp_arg1) != PLUS)
213+ return 0;
214+ reg_orig = XEXP (cmp_arg1, 0);
215+ if (XEXP (cmp_arg1, 1) != GEN_INT (-1)
216+ || !REG_P (reg_orig))
217+ return 0;
218+ cc_reg = SET_DEST (cmp_orig);
219+
220+ inc = XVECEXP (PATTERN (prev_insn), 0, 1);
221+ }
222+ else
223+ inc = PATTERN (prev_insn);
224 /* We expect the condition to be of the form (reg != 0) */
225 cond = XEXP (SET_SRC (cmp), 0);
226 if (GET_CODE (cond) != NE || XEXP (cond, 1) != const0_rtx)
227 return 0;
228-
229 }
230 else
231 {
232@@ -162,11 +201,15 @@
233 return 0;
234
235 if ((XEXP (condition, 0) == reg)
236+ /* For the third case: */
237+ || ((cc_reg != NULL_RTX)
238+ && (XEXP (condition, 0) == cc_reg)
239+ && (reg_orig == reg))
240 || (GET_CODE (XEXP (condition, 0)) == PLUS
241- && XEXP (XEXP (condition, 0), 0) == reg))
242+ && XEXP (XEXP (condition, 0), 0) == reg))
243 {
244 if (GET_CODE (pattern) != PARALLEL)
245- /* The second form we expect:
246+ /* For the second form we expect:
247
248 (set (reg) (plus (reg) (const_int -1))
249 (set (pc) (if_then_else (reg != 0)
250@@ -181,7 +224,24 @@
251 (set (reg) (plus (reg) (const_int -1)))
252 (additional clobbers and uses)])
253
254- So we return that form instead.
255+ For the third form we expect:
256+
257+ (parallel [(set (cc) (compare ((plus (reg) (const_int -1)), 0))
258+ (set (reg) (plus (reg) (const_int -1)))])
259+ (set (pc) (if_then_else (cc == NE)
260+ (label_ref (label))
261+ (pc)))
262+
263+ which is equivalent to the following:
264+
265+ (parallel [(set (cc) (compare (reg, 1))
266+ (set (reg) (plus (reg) (const_int -1)))
267+ (set (pc) (if_then_else (NE == cc)
268+ (label_ref (label))
269+ (pc))))])
270+
271+ So we return the second form instead for the two cases.
272+
273 */
274 condition = gen_rtx_fmt_ee (NE, VOIDmode, inc_src, const1_rtx);
275
276
277=== modified file 'gcc/modulo-sched.c'
278--- gcc/modulo-sched.c 2011-02-14 17:59:10 +0000
279+++ gcc/modulo-sched.c 2011-05-11 10:02:07 +0000
280@@ -84,14 +84,13 @@
281 II cycles (i.e. use register copies to prevent a def from overwriting
282 itself before reaching the use).
283
284- SMS works with countable loops (1) whose control part can be easily
285- decoupled from the rest of the loop and (2) whose loop count can
286- be easily adjusted. This is because we peel a constant number of
287- iterations into a prologue and epilogue for which we want to avoid
288- emitting the control part, and a kernel which is to iterate that
289- constant number of iterations less than the original loop. So the
290- control part should be a set of insns clearly identified and having
291- its own iv, not otherwise used in the loop (at-least for now), which
292+ SMS works with countable loops whose loop count can be easily
293+ adjusted. This is because we peel a constant number of iterations
294+ into a prologue and epilogue for which we want to avoid emitting
295+ the control part, and a kernel which is to iterate that constant
296+ number of iterations less than the original loop. So the control
297+ part should be a set of insns clearly identified and having its
298+ own iv, not otherwise used in the loop (at-least for now), which
299 initializes a register before the loop to the number of iterations.
300 Currently SMS relies on the do-loop pattern to recognize such loops,
301 where (1) the control part comprises of all insns defining and/or
302@@ -116,8 +115,10 @@
303
304 /* The number of different iterations the nodes in ps span, assuming
305 the stage boundaries are placed efficiently. */
306-#define PS_STAGE_COUNT(ps) ((PS_MAX_CYCLE (ps) - PS_MIN_CYCLE (ps) \
307- + 1 + (ps)->ii - 1) / (ps)->ii)
308+#define CALC_STAGE_COUNT(max_cycle,min_cycle,ii) ((max_cycle - min_cycle \
309+ + 1 + ii - 1) / ii)
310+/* The stage count of ps. */
311+#define PS_STAGE_COUNT(ps) (((partial_schedule_ptr)(ps))->stage_count)
312
313 /* A single instruction in the partial schedule. */
314 struct ps_insn
315@@ -155,6 +156,8 @@
316 int max_cycle;
317
318 ddg_ptr g; /* The DDG of the insns in the partial schedule. */
319+
320+ int stage_count; /* The stage count of the partial schedule. */
321 };
322
323 /* We use this to record all the register replacements we do in
324@@ -195,7 +198,7 @@
325 rtx, rtx);
326 static void duplicate_insns_of_cycles (partial_schedule_ptr,
327 int, int, int, rtx);
328-
329+static int calculate_stage_count (partial_schedule_ptr ps);
330 #define SCHED_ASAP(x) (((node_sched_params_ptr)(x)->aux.info)->asap)
331 #define SCHED_TIME(x) (((node_sched_params_ptr)(x)->aux.info)->time)
332 #define SCHED_FIRST_REG_MOVE(x) \
333@@ -310,10 +313,10 @@
334 either a single (parallel) branch-on-count or a (non-parallel)
335 branch immediately preceded by a single (decrement) insn. */
336 first_insn_not_to_check = (GET_CODE (PATTERN (tail)) == PARALLEL ? tail
337- : PREV_INSN (tail));
338+ : prev_nondebug_insn (tail));
339
340 for (insn = head; insn != first_insn_not_to_check; insn = NEXT_INSN (insn))
341- if (reg_mentioned_p (reg, insn))
342+ if (reg_mentioned_p (reg, insn) && !DEBUG_INSN_P (insn))
343 {
344 if (dump_file)
345 {
346@@ -569,13 +572,12 @@
347 }
348 }
349
350-/* Bump the SCHED_TIMEs of all nodes to start from zero. Set the values
351- of SCHED_ROW and SCHED_STAGE. */
352+/* Bump the SCHED_TIMEs of all nodes by AMOUNT. Set the values of
353+ SCHED_ROW and SCHED_STAGE. */
354 static void
355-normalize_sched_times (partial_schedule_ptr ps)
356+reset_sched_times (partial_schedule_ptr ps, int amount)
357 {
358 int row;
359- int amount = PS_MIN_CYCLE (ps);
360 int ii = ps->ii;
361 ps_insn_ptr crr_insn;
362
363@@ -584,19 +586,43 @@
364 {
365 ddg_node_ptr u = crr_insn->node;
366 int normalized_time = SCHED_TIME (u) - amount;
367+ int new_min_cycle = PS_MIN_CYCLE (ps) - amount;
368+ int sc_until_cycle_zero, stage;
369
370- if (dump_file)
371- fprintf (dump_file, "crr_insn->node=%d, crr_insn->cycle=%d,\
372- min_cycle=%d\n", crr_insn->node->cuid, SCHED_TIME
373- (u), ps->min_cycle);
374+ if (dump_file)
375+ {
376+ /* Print the scheduling times after the rotation. */
377+ fprintf (dump_file, "crr_insn->node=%d (insn id %d), "
378+ "crr_insn->cycle=%d, min_cycle=%d", crr_insn->node->cuid,
379+ INSN_UID (crr_insn->node->insn), SCHED_TIME (u),
380+ normalized_time);
381+ if (JUMP_P (crr_insn->node->insn))
382+ fprintf (dump_file, " (branch)");
383+ fprintf (dump_file, "\n");
384+ }
385+
386 gcc_assert (SCHED_TIME (u) >= ps->min_cycle);
387 gcc_assert (SCHED_TIME (u) <= ps->max_cycle);
388 SCHED_TIME (u) = normalized_time;
389- SCHED_ROW (u) = normalized_time % ii;
390- SCHED_STAGE (u) = normalized_time / ii;
391+ SCHED_ROW (u) = SMODULO (normalized_time, ii);
392+
393+ /* The calculation of stage count is done adding the number
394+ of stages before cycle zero and after cycle zero. */
395+ sc_until_cycle_zero = CALC_STAGE_COUNT (-1, new_min_cycle, ii);
396+
397+ if (SCHED_TIME (u) < 0)
398+ {
399+ stage = CALC_STAGE_COUNT (-1, SCHED_TIME (u), ii);
400+ SCHED_STAGE (u) = sc_until_cycle_zero - stage;
401+ }
402+ else
403+ {
404+ stage = CALC_STAGE_COUNT (SCHED_TIME (u), 0, ii);
405+ SCHED_STAGE (u) = sc_until_cycle_zero + stage - 1;
406+ }
407 }
408 }
409-
410+
411 /* Set SCHED_COLUMN of each node according to its position in PS. */
412 static void
413 set_columns_for_ps (partial_schedule_ptr ps)
414@@ -646,9 +672,12 @@
415
416 /* Do not duplicate any insn which refers to count_reg as it
417 belongs to the control part.
418+ The closing branch is scheduled as well and thus should
419+ be ignored.
420 TODO: This should be done by analyzing the control part of
421 the loop. */
422- if (reg_mentioned_p (count_reg, u_node->insn))
423+ if (reg_mentioned_p (count_reg, u_node->insn)
424+ || JUMP_P (ps_ij->node->insn))
425 continue;
426
427 if (for_prolog)
428@@ -1009,9 +1038,11 @@
429 continue;
430 }
431
432- /* Don't handle BBs with calls or barriers, or !single_set insns,
433- or auto-increment insns (to avoid creating invalid reg-moves
434- for the auto-increment insns).
435+ /* Don't handle BBs with calls or barriers or auto-increment insns
436+ (to avoid creating invalid reg-moves for the auto-increment insns),
437+ or !single_set with the exception of instructions that include
438+ count_reg---these instructions are part of the control part
439+ that do-loop recognizes.
440 ??? Should handle auto-increment insns.
441 ??? Should handle insns defining subregs. */
442 for (insn = head; insn != NEXT_INSN (tail); insn = NEXT_INSN (insn))
443@@ -1021,7 +1052,8 @@
444 if (CALL_P (insn)
445 || BARRIER_P (insn)
446 || (NONDEBUG_INSN_P (insn) && !JUMP_P (insn)
447- && !single_set (insn) && GET_CODE (PATTERN (insn)) != USE)
448+ && !single_set (insn) && GET_CODE (PATTERN (insn)) != USE
449+ && !reg_mentioned_p (count_reg, insn))
450 || (FIND_REG_INC_NOTE (insn, NULL_RTX) != 0)
451 || (INSN_P (insn) && (set = single_set (insn))
452 && GET_CODE (SET_DEST (set)) == SUBREG))
453@@ -1049,7 +1081,11 @@
454 continue;
455 }
456
457- if (! (g = create_ddg (bb, 0)))
458+ /* Always schedule the closing branch with the rest of the
459+ instructions. The branch is rotated to be in row ii-1 at the
460+ end of the scheduling procedure to make sure it's the last
461+ instruction in the iteration. */
462+ if (! (g = create_ddg (bb, 1)))
463 {
464 if (dump_file)
465 fprintf (dump_file, "SMS create_ddg failed\n");
466@@ -1157,14 +1193,17 @@
467
468 ps = sms_schedule_by_order (g, mii, maxii, node_order);
469
470- if (ps){
471- stage_count = PS_STAGE_COUNT (ps);
472- gcc_assert(stage_count >= 1);
473- }
474+ if (ps)
475+ {
476+ stage_count = calculate_stage_count (ps);
477+ gcc_assert(stage_count >= 1);
478+ PS_STAGE_COUNT(ps) = stage_count;
479+ }
480
481- /* Stage count of 1 means that there is no interleaving between
482- iterations, let the scheduling passes do the job. */
483- if (stage_count <= 1
484+ /* The default value of PARAM_SMS_MIN_SC is 2 as stage count of
485+ 1 means that there is no interleaving between iterations thus
486+ we let the scheduling passes do the job in this case. */
487+ if (stage_count < (unsigned) PARAM_VALUE (PARAM_SMS_MIN_SC)
488 || (count_init && (loop_count <= stage_count))
489 || (flag_branch_probabilities && (trip_count <= stage_count)))
490 {
491@@ -1182,32 +1221,24 @@
492 else
493 {
494 struct undo_replace_buff_elem *reg_move_replaces;
495-
496- if (dump_file)
497- {
498+ int amount = SCHED_TIME (g->closing_branch) + 1;
499+
500+ /* Set the stage boundaries. The closing_branch was scheduled
501+ and should appear in the last (ii-1) row. */
502+ reset_sched_times (ps, amount);
503+ rotate_partial_schedule (ps, amount);
504+ set_columns_for_ps (ps);
505+
506+ canon_loop (loop);
507+
508+ if (dump_file)
509+ {
510 fprintf (dump_file,
511 "SMS succeeded %d %d (with ii, sc)\n", ps->ii,
512 stage_count);
513 print_partial_schedule (ps, dump_file);
514- fprintf (dump_file,
515- "SMS Branch (%d) will later be scheduled at cycle %d.\n",
516- g->closing_branch->cuid, PS_MIN_CYCLE (ps) - 1);
517 }
518-
519- /* Set the stage boundaries. If the DDG is built with closing_branch_deps,
520- the closing_branch was scheduled and should appear in the last (ii-1)
521- row. Otherwise, we are free to schedule the branch, and we let nodes
522- that were scheduled at the first PS_MIN_CYCLE cycle appear in the first
523- row; this should reduce stage_count to minimum.
524- TODO: Revisit the issue of scheduling the insns of the
525- control part relative to the branch when the control part
526- has more than one insn. */
527- normalize_sched_times (ps);
528- rotate_partial_schedule (ps, PS_MIN_CYCLE (ps));
529- set_columns_for_ps (ps);
530-
531- canon_loop (loop);
532-
533+
534 /* case the BCT count is not known , Do loop-versioning */
535 if (count_reg && ! count_init)
536 {
537@@ -1760,12 +1791,6 @@
538 continue;
539 }
540
541- if (JUMP_P (insn)) /* Closing branch handled later. */
542- {
543- RESET_BIT (tobe_scheduled, u);
544- continue;
545- }
546-
547 if (TEST_BIT (sched_nodes, u))
548 continue;
549
550@@ -1893,8 +1918,8 @@
551 if (dump_file)
552 fprintf (dump_file, "split_row=%d\n", split_row);
553
554- normalize_sched_times (ps);
555- rotate_partial_schedule (ps, ps->min_cycle);
556+ reset_sched_times (ps, PS_MIN_CYCLE (ps));
557+ rotate_partial_schedule (ps, PS_MIN_CYCLE (ps));
558
559 rows_new = (ps_insn_ptr *) xcalloc (new_ii, sizeof (ps_insn_ptr));
560 for (row = 0; row < split_row; row++)
561@@ -2571,6 +2596,7 @@
562 ps_insn_ptr next_ps_i;
563 ps_insn_ptr first_must_follow = NULL;
564 ps_insn_ptr last_must_precede = NULL;
565+ ps_insn_ptr last_in_row = NULL;
566 int row;
567
568 if (! ps_i)
569@@ -2597,8 +2623,37 @@
570 else
571 last_must_precede = next_ps_i;
572 }
573+ /* The closing branch must be the last in the row. */
574+ if (must_precede
575+ && TEST_BIT (must_precede, next_ps_i->node->cuid)
576+ && JUMP_P (next_ps_i->node->insn))
577+ return false;
578+
579+ last_in_row = next_ps_i;
580 }
581
582+ /* The closing branch is scheduled as well. Make sure there is no
583+ dependent instruction after it as the branch should be the last
584+ instruction in the row. */
585+ if (JUMP_P (ps_i->node->insn))
586+ {
587+ if (first_must_follow)
588+ return false;
589+ if (last_in_row)
590+ {
591+ /* Make the branch the last in the row. New instructions
592+ will be inserted at the beginning of the row or after the
593+ last must_precede instruction thus the branch is guaranteed
594+ to remain the last instruction in the row. */
595+ last_in_row->next_in_row = ps_i;
596+ ps_i->prev_in_row = last_in_row;
597+ ps_i->next_in_row = NULL;
598+ }
599+ else
600+ ps->rows[row] = ps_i;
601+ return true;
602+ }
603+
604 /* Now insert the node after INSERT_AFTER_PSI. */
605
606 if (! last_must_precede)
607@@ -2820,6 +2875,24 @@
608 return ps_i;
609 }
610
611+/* Calculate the stage count of the partial schedule PS. The calculation
612+ takes into account the rotation to bring the closing branch to row
613+ ii-1. */
614+int
615+calculate_stage_count (partial_schedule_ptr ps)
616+{
617+ int rotation_amount = (SCHED_TIME (ps->g->closing_branch)) + 1;
618+ int new_min_cycle = PS_MIN_CYCLE (ps) - rotation_amount;
619+ int new_max_cycle = PS_MAX_CYCLE (ps) - rotation_amount;
620+ int stage_count = CALC_STAGE_COUNT (-1, new_min_cycle, ps->ii);
621+
622+ /* The calculation of stage count is done adding the number of stages
623+ before cycle zero and after cycle zero. */
624+ stage_count += CALC_STAGE_COUNT (new_max_cycle, 0, ps->ii);
625+
626+ return stage_count;
627+}
628+
629 /* Rotate the rows of PS such that insns scheduled at time
630 START_CYCLE will appear in row 0. Updates max/min_cycles. */
631 void
632
633=== modified file 'gcc/params.def'
634--- gcc/params.def 2011-04-18 11:31:29 +0000
635+++ gcc/params.def 2011-05-11 10:02:07 +0000
636@@ -344,6 +344,11 @@
637 "sms-max-ii-factor",
638 "A factor for tuning the upper bound that swing modulo scheduler uses for scheduling a loop",
639 100, 0, 0)
640+/* The minimum value of stage count that swing modulo scheduler will generate. */
641+DEFPARAM(PARAM_SMS_MIN_SC,
642+ "sms-min-sc",
643+ "The minimum value of stage count that swing modulo scheduler will generate.",
644+ 2, 1, 1)
645 DEFPARAM(PARAM_SMS_DFA_HISTORY,
646 "sms-dfa-history",
647 "The number of cycles the swing modulo scheduler considers when checking conflicts using DFA",

Subscribers

People subscribed via source and target branches