Merge lp:~eres/gcc-linaro/sms_doloop_for_ARM_new_version-4_6 into lp:gcc-linaro/4.6

Proposed by Revital Eres
Status: Merged
Approved by: Richard Sandiford
Approved revision: no longer in the source branch.
Merged at revision: 106747
Proposed branch: lp:~eres/gcc-linaro/sms_doloop_for_ARM_new_version-4_6
Merge into: lp:gcc-linaro/4.6
Diff against target: 647 lines (+303/-75)
8 files modified
ChangeLog.linaro (+30/-0)
gcc/config/arm/arm.md (+1/-1)
gcc/config/arm/thumb2.md (+52/-1)
gcc/ddg.c (+5/-0)
gcc/doc/invoke.texi (+4/-0)
gcc/loop-doloop.c (+67/-7)
gcc/modulo-sched.c (+139/-66)
gcc/params.def (+5/-0)
To merge this branch: bzr merge lp:~eres/gcc-linaro/sms_doloop_for_ARM_new_version-4_6
Reviewer Review Type Date Requested Status
Linaro Toolchain Developers Pending
Review via email: mp+60601@code.launchpad.net

This proposal supersedes a proposal from 2011-05-11.

Description of the change

Implement doloop pattern for ARM and support it in SMS.
The change consists of these patches that sent upstream:
1) Doloop pattern for ARM:
http://gcc.gnu.org/ml/gcc-patches/2011-01/msg01718.html (approved for trunk)
2) Support closing_branch_deps:
 http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00592.html (approved for trunk)
3) New flag to apply SMS when SC equals 1
http://gcc.gnu.org/ml/gcc-patches/2011-04/msg01304.html (approved for trunk)

To post a comment you must log in.
Revision history for this message
Linaro Toolchain Builder (cbuild) wrote :

cbuild has taken a snapshot of this branch at r106745 and queued it for build.

The snapshot is available at:
 http://ex.seabright.co.nz/snapshots/gcc-linaro-4.6+bzr106745~eres~sms_doloop_for_ARM_new_version-4_6.tar.xdelta3.xz

and will be built on the following builders:
 a9-builder i686 x86_64

You can track the build queue at:
 http://ex.seabright.co.nz/helpers/scheduler

cbuild-snapshot: gcc-linaro-4.6+bzr106745~eres~sms_doloop_for_ARM_new_version-4_6
cbuild-ancestor: lp:gcc-linaro/4.6+bzr106744
cbuild-state: check

Revision history for this message
Linaro Toolchain Builder (cbuild) wrote :

cbuild successfully built this on armv7l-maverick-cbuild114-ursa4-cortexa9r1.

The build results are available at:
 http://ex.seabright.co.nz/build/gcc-linaro-4.6+bzr106745~eres~sms_doloop_for_ARM_new_version-4_6/logs/armv7l-maverick-cbuild114-ursa4-cortexa9r1

The test suite results changed compared to the branch point lp:gcc-linaro/4.6+bzr106744:
 +PASS: gcc.c-torture/unsorted/dump-noaddr.c.169r.loop2_doloop, -O1 comparison
 +PASS: gcc.c-torture/unsorted/dump-noaddr.c.169r.loop2_doloop, -O2 comparison
 +PASS: gcc.c-torture/unsorted/dump-noaddr.c.169r.loop2_doloop, -O2 -flto comparison
 +PASS: gcc.c-torture/unsorted/dump-noaddr.c.169r.loop2_doloop, -O2 -flto -flto-partition=none comparison
 +PASS: gcc.c-torture/unsorted/dump-noaddr.c.169r.loop2_doloop, -O3 -fomit-frame-pointer comparison
 +PASS: gcc.c-torture/unsorted/dump-noaddr.c.169r.loop2_doloop, -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions comparison
 +PASS: gcc.c-torture/unsorted/dump-noaddr.c.169r.loop2_doloop, -O3 -fomit-frame-pointer -funroll-loops comparison
 +PASS: gcc.c-torture/unsorted/dump-noaddr.c.169r.loop2_doloop, -O3 -g comparison
 +PASS: gcc.c-torture/unsorted/dump-noaddr.c.169r.loop2_doloop, -Os comparison

The full testsuite results are at:
 http://ex.seabright.co.nz/build/gcc-linaro-4.6+bzr106745~eres~sms_doloop_for_ARM_new_version-4_6/logs/armv7l-maverick-cbuild114-ursa4-cortexa9r1/gcc-testsuite.txt

cbuild-checked: armv7l-maverick-cbuild114-ursa4-cortexa9r1

Revision history for this message
Linaro Toolchain Builder (cbuild) wrote :

cbuild successfully built this on i686-lucid-cbuild114-scorpius-i686r1.

The build results are available at:
 http://ex.seabright.co.nz/build/gcc-linaro-4.6+bzr106745~eres~sms_doloop_for_ARM_new_version-4_6/logs/i686-lucid-cbuild114-scorpius-i686r1

The test suite results were unchanged compared to the branch point lp:gcc-linaro/4.6+bzr106744.

The full testsuite results are at:
 http://ex.seabright.co.nz/build/gcc-linaro-4.6+bzr106745~eres~sms_doloop_for_ARM_new_version-4_6/logs/i686-lucid-cbuild114-scorpius-i686r1/gcc-testsuite.txt

cbuild-checked: i686-lucid-cbuild114-scorpius-i686r1

Revision history for this message
Richard Sandiford (rsandifo) wrote :

OK.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'ChangeLog.linaro'
--- ChangeLog.linaro 2011-05-06 11:33:02 +0000
+++ ChangeLog.linaro 2011-05-11 10:02:07 +0000
@@ -1,3 +1,33 @@
12011-05-11 Revital Eres <revital.eres@linaro.org>
2
3 gcc/
4 * loop-doloop.c (doloop_condition_get): Support new form of
5 doloop pattern and use prev_nondebug_insn instead of PREV_INSN.
6 * config/arm/thumb2.md (*thumb2_addsi3_compare0): Remove "*".
7 (doloop_end): New.
8 * config/arm/arm.md (*addsi3_compare0): Remove "*".
9 * params.def (sms-min-sc): New param flag.
10 * doc/invoke.texi (sms-min-sc): Document it.
11 * ddg.c (create_ddg_dep_from_intra_loop_link): If a true dep edge
12 enters the branch create an anti edge in the opposite direction
13 to prevent the creation of reg-moves.
14 * modulo-sched.c: Adjust comment to reflect the fact we are
15 scheduling closing branch.
16 (PS_STAGE_COUNT): Rename to CALC_STAGE_COUNT and redefine.
17 (stage_count): New field in struct partial_schedule.
18 (calculate_stage_count): New function.
19 (normalize_sched_times): Rename to reset_sched_times and handle
20 incrementing the sched time of the nodes by a constant value
21 passed as parameter.
22 (duplicate_insns_of_cycles): Skip closing branch.
23 (sms_schedule_by_order): Schedule closing branch.
24 (ps_insn_find_column): Handle closing branch.
25 (sms_schedule): Call reset_sched_times and adjust the code to
26 support scheduling of the closing branch. Use sms-min-sc.
27 Support new form of doloop pattern.
28 (ps_insert_empty_row): Update calls to normalize_sched_times
29 and rotate_partial_schedule functions.
30
12011-05-06 Richard Sandiford <richard.sandiford@linaro.org>312011-05-06 Richard Sandiford <richard.sandiford@linaro.org>
232
3 gcc/33 gcc/
434
=== modified file 'gcc/config/arm/arm.md'
--- gcc/config/arm/arm.md 2011-04-13 14:06:16 +0000
+++ gcc/config/arm/arm.md 2011-05-11 10:02:07 +0000
@@ -791,7 +791,7 @@
791 ""791 ""
792)792)
793793
794(define_insn "*addsi3_compare0"794(define_insn "addsi3_compare0"
795 [(set (reg:CC_NOOV CC_REGNUM)795 [(set (reg:CC_NOOV CC_REGNUM)
796 (compare:CC_NOOV796 (compare:CC_NOOV
797 (plus:SI (match_operand:SI 1 "s_register_operand" "r, r")797 (plus:SI (match_operand:SI 1 "s_register_operand" "r, r")
798798
=== modified file 'gcc/config/arm/thumb2.md'
--- gcc/config/arm/thumb2.md 2011-01-03 20:52:22 +0000
+++ gcc/config/arm/thumb2.md 2011-05-11 10:02:07 +0000
@@ -836,7 +836,7 @@
836 "operands[4] = GEN_INT (- INTVAL (operands[2]));"836 "operands[4] = GEN_INT (- INTVAL (operands[2]));"
837)837)
838838
839(define_insn "*thumb2_addsi3_compare0"839(define_insn "thumb2_addsi3_compare0"
840 [(set (reg:CC_NOOV CC_REGNUM)840 [(set (reg:CC_NOOV CC_REGNUM)
841 (compare:CC_NOOV841 (compare:CC_NOOV
842 (plus:SI (match_operand:SI 1 "s_register_operand" "l, 0, r")842 (plus:SI (match_operand:SI 1 "s_register_operand" "l, 0, r")
@@ -1118,3 +1118,54 @@
1118 "1118 "
1119 operands[2] = GEN_INT (32 - INTVAL (operands[2]));1119 operands[2] = GEN_INT (32 - INTVAL (operands[2]));
1120 ")1120 ")
1121
1122;; Define the subtract-one-and-jump insns so loop.c
1123;; knows what to generate.
1124(define_expand "doloop_end"
1125 [(use (match_operand 0 "" "")) ; loop pseudo
1126 (use (match_operand 1 "" "")) ; iterations; zero if unknown
1127 (use (match_operand 2 "" "")) ; max iterations
1128 (use (match_operand 3 "" "")) ; loop level
1129 (use (match_operand 4 "" ""))] ; label
1130 "TARGET_32BIT"
1131 "
1132 {
1133 /* Currently SMS relies on the do-loop pattern to recognize loops
1134 where (1) the control part consists of all insns defining and/or
1135 using a certain 'count' register and (2) the loop count can be
1136 adjusted by modifying this register prior to the loop.
1137 ??? The possible introduction of a new block to initialize the
1138 new IV can potentially affect branch optimizations. */
1139 if (optimize > 0 && flag_modulo_sched)
1140 {
1141 rtx s0;
1142 rtx bcomp;
1143 rtx loc_ref;
1144 rtx cc_reg;
1145 rtx insn;
1146 rtx cmp;
1147
1148 /* Only use this on innermost loops. */
1149 if (INTVAL (operands[3]) > 1)
1150 FAIL;
1151 if (GET_MODE (operands[0]) != SImode)
1152 FAIL;
1153
1154 s0 = operands [0];
1155 if (TARGET_THUMB2)
1156 insn = emit_insn (gen_thumb2_addsi3_compare0 (s0, s0, GEN_INT (-1)));
1157 else
1158 insn = emit_insn (gen_addsi3_compare0 (s0, s0, GEN_INT (-1)));
1159
1160 cmp = XVECEXP (PATTERN (insn), 0, 0);
1161 cc_reg = SET_DEST (cmp);
1162 bcomp = gen_rtx_NE (VOIDmode, cc_reg, const0_rtx);
1163 loc_ref = gen_rtx_LABEL_REF (VOIDmode, operands [4]);
1164 emit_jump_insn (gen_rtx_SET (VOIDmode, pc_rtx,
1165 gen_rtx_IF_THEN_ELSE (VOIDmode, bcomp,
1166 loc_ref, pc_rtx)));
1167 DONE;
1168 }else
1169 FAIL;
1170}")
1171
11211172
=== modified file 'gcc/ddg.c'
--- gcc/ddg.c 2010-11-30 11:41:24 +0000
+++ gcc/ddg.c 2011-05-11 10:02:07 +0000
@@ -197,6 +197,11 @@
197 }197 }
198 }198 }
199199
200 /* If a true dep edge enters the branch create an anti edge in the
201 opposite direction to prevent the creation of reg-moves. */
202 if ((DEP_TYPE (link) == REG_DEP_TRUE) && JUMP_P (dest_node->insn))
203 create_ddg_dep_no_link (g, dest_node, src_node, ANTI_DEP, REG_DEP, 1);
204
200 latency = dep_cost (link);205 latency = dep_cost (link);
201 e = create_ddg_edge (src_node, dest_node, t, dt, latency, distance);206 e = create_ddg_edge (src_node, dest_node, t, dt, latency, distance);
202 add_edge_to_ddg (g, e);207 add_edge_to_ddg (g, e);
203208
=== modified file 'gcc/doc/invoke.texi'
--- gcc/doc/invoke.texi 2011-04-18 11:31:29 +0000
+++ gcc/doc/invoke.texi 2011-05-11 10:02:07 +0000
@@ -8730,6 +8730,10 @@
8730The maximum number of best instructions in the ready list that are considered8730The maximum number of best instructions in the ready list that are considered
8731for renaming in the selective scheduler. The default value is 2.8731for renaming in the selective scheduler. The default value is 2.
87328732
8733@item sms-min-sc
8734The minimum value of stage count that swing modulo scheduler will
8735generate. The default value is 2.
8736
8733@item max-last-value-rtl8737@item max-last-value-rtl
8734The maximum size measured as number of RTLs that can be recorded in an expression8738The maximum size measured as number of RTLs that can be recorded in an expression
8735in combiner for a pseudo register as last known value of that register. The default8739in combiner for a pseudo register as last known value of that register. The default
87368740
=== modified file 'gcc/loop-doloop.c'
--- gcc/loop-doloop.c 2010-11-30 11:41:24 +0000
+++ gcc/loop-doloop.c 2011-05-11 10:02:07 +0000
@@ -78,6 +78,8 @@
78 rtx inc_src;78 rtx inc_src;
79 rtx condition;79 rtx condition;
80 rtx pattern;80 rtx pattern;
81 rtx cc_reg = NULL_RTX;
82 rtx reg_orig = NULL_RTX;
8183
82 /* The canonical doloop pattern we expect has one of the following84 /* The canonical doloop pattern we expect has one of the following
83 forms:85 forms:
@@ -96,7 +98,16 @@
96 2) (set (reg) (plus (reg) (const_int -1))98 2) (set (reg) (plus (reg) (const_int -1))
97 (set (pc) (if_then_else (reg != 0)99 (set (pc) (if_then_else (reg != 0)
98 (label_ref (label))100 (label_ref (label))
99 (pc))). */101 (pc))).
102
103 Some targets (ARM) do the comparison before the branch, as in the
104 following form:
105
106 3) (parallel [(set (cc) (compare ((plus (reg) (const_int -1), 0)))
107 (set (reg) (plus (reg) (const_int -1)))])
108 (set (pc) (if_then_else (cc == NE)
109 (label_ref (label))
110 (pc))) */
100111
101 pattern = PATTERN (doloop_pat);112 pattern = PATTERN (doloop_pat);
102113
@@ -104,19 +115,47 @@
104 {115 {
105 rtx cond;116 rtx cond;
106 rtx prev_insn = prev_nondebug_insn (doloop_pat);117 rtx prev_insn = prev_nondebug_insn (doloop_pat);
118 rtx cmp_arg1, cmp_arg2;
119 rtx cmp_orig;
107120
108 /* We expect the decrement to immediately precede the branch. */121 /* In case the pattern is not PARALLEL we expect two forms
122 of doloop which are cases 2) and 3) above: in case 2) the
123 decrement immediately precedes the branch, while in case 3)
124 the compare and decrement instructions immediately precede
125 the branch. */
109126
110 if (prev_insn == NULL_RTX || !INSN_P (prev_insn))127 if (prev_insn == NULL_RTX || !INSN_P (prev_insn))
111 return 0;128 return 0;
112129
113 cmp = pattern;130 cmp = pattern;
114 inc = PATTERN (PREV_INSN (doloop_pat));131 if (GET_CODE (PATTERN (prev_insn)) == PARALLEL)
132 {
133 /* The third case: the compare and decrement instructions
134 immediately precede the branch. */
135 cmp_orig = XVECEXP (PATTERN (prev_insn), 0, 0);
136 if (GET_CODE (cmp_orig) != SET)
137 return 0;
138 if (GET_CODE (SET_SRC (cmp_orig)) != COMPARE)
139 return 0;
140 cmp_arg1 = XEXP (SET_SRC (cmp_orig), 0);
141 cmp_arg2 = XEXP (SET_SRC (cmp_orig), 1);
142 if (cmp_arg2 != const0_rtx
143 || GET_CODE (cmp_arg1) != PLUS)
144 return 0;
145 reg_orig = XEXP (cmp_arg1, 0);
146 if (XEXP (cmp_arg1, 1) != GEN_INT (-1)
147 || !REG_P (reg_orig))
148 return 0;
149 cc_reg = SET_DEST (cmp_orig);
150
151 inc = XVECEXP (PATTERN (prev_insn), 0, 1);
152 }
153 else
154 inc = PATTERN (prev_insn);
115 /* We expect the condition to be of the form (reg != 0) */155 /* We expect the condition to be of the form (reg != 0) */
116 cond = XEXP (SET_SRC (cmp), 0);156 cond = XEXP (SET_SRC (cmp), 0);
117 if (GET_CODE (cond) != NE || XEXP (cond, 1) != const0_rtx)157 if (GET_CODE (cond) != NE || XEXP (cond, 1) != const0_rtx)
118 return 0;158 return 0;
119
120 }159 }
121 else160 else
122 {161 {
@@ -162,11 +201,15 @@
162 return 0;201 return 0;
163202
164 if ((XEXP (condition, 0) == reg)203 if ((XEXP (condition, 0) == reg)
204 /* For the third case: */
205 || ((cc_reg != NULL_RTX)
206 && (XEXP (condition, 0) == cc_reg)
207 && (reg_orig == reg))
165 || (GET_CODE (XEXP (condition, 0)) == PLUS208 || (GET_CODE (XEXP (condition, 0)) == PLUS
166 && XEXP (XEXP (condition, 0), 0) == reg))209 && XEXP (XEXP (condition, 0), 0) == reg))
167 {210 {
168 if (GET_CODE (pattern) != PARALLEL)211 if (GET_CODE (pattern) != PARALLEL)
169 /* The second form we expect:212 /* For the second form we expect:
170213
171 (set (reg) (plus (reg) (const_int -1))214 (set (reg) (plus (reg) (const_int -1))
172 (set (pc) (if_then_else (reg != 0)215 (set (pc) (if_then_else (reg != 0)
@@ -181,7 +224,24 @@
181 (set (reg) (plus (reg) (const_int -1)))224 (set (reg) (plus (reg) (const_int -1)))
182 (additional clobbers and uses)])225 (additional clobbers and uses)])
183226
184 So we return that form instead.227 For the third form we expect:
228
229 (parallel [(set (cc) (compare ((plus (reg) (const_int -1)), 0))
230 (set (reg) (plus (reg) (const_int -1)))])
231 (set (pc) (if_then_else (cc == NE)
232 (label_ref (label))
233 (pc)))
234
235 which is equivalent to the following:
236
237 (parallel [(set (cc) (compare (reg, 1))
238 (set (reg) (plus (reg) (const_int -1)))
239 (set (pc) (if_then_else (NE == cc)
240 (label_ref (label))
241 (pc))))])
242
243 So we return the second form instead for the two cases.
244
185 */245 */
186 condition = gen_rtx_fmt_ee (NE, VOIDmode, inc_src, const1_rtx);246 condition = gen_rtx_fmt_ee (NE, VOIDmode, inc_src, const1_rtx);
187247
188248
=== modified file 'gcc/modulo-sched.c'
--- gcc/modulo-sched.c 2011-02-14 17:59:10 +0000
+++ gcc/modulo-sched.c 2011-05-11 10:02:07 +0000
@@ -84,14 +84,13 @@
84 II cycles (i.e. use register copies to prevent a def from overwriting84 II cycles (i.e. use register copies to prevent a def from overwriting
85 itself before reaching the use).85 itself before reaching the use).
8686
87 SMS works with countable loops (1) whose control part can be easily87 SMS works with countable loops whose loop count can be easily
88 decoupled from the rest of the loop and (2) whose loop count can88 adjusted. This is because we peel a constant number of iterations
89 be easily adjusted. This is because we peel a constant number of89 into a prologue and epilogue for which we want to avoid emitting
90 iterations into a prologue and epilogue for which we want to avoid90 the control part, and a kernel which is to iterate that constant
91 emitting the control part, and a kernel which is to iterate that91 number of iterations less than the original loop. So the control
92 constant number of iterations less than the original loop. So the92 part should be a set of insns clearly identified and having its
93 control part should be a set of insns clearly identified and having93 own iv, not otherwise used in the loop (at-least for now), which
94 its own iv, not otherwise used in the loop (at-least for now), which
95 initializes a register before the loop to the number of iterations.94 initializes a register before the loop to the number of iterations.
96 Currently SMS relies on the do-loop pattern to recognize such loops,95 Currently SMS relies on the do-loop pattern to recognize such loops,
97 where (1) the control part comprises of all insns defining and/or96 where (1) the control part comprises of all insns defining and/or
@@ -116,8 +115,10 @@
116115
117/* The number of different iterations the nodes in ps span, assuming116/* The number of different iterations the nodes in ps span, assuming
118 the stage boundaries are placed efficiently. */117 the stage boundaries are placed efficiently. */
119#define PS_STAGE_COUNT(ps) ((PS_MAX_CYCLE (ps) - PS_MIN_CYCLE (ps) \118#define CALC_STAGE_COUNT(max_cycle,min_cycle,ii) ((max_cycle - min_cycle \
120 + 1 + (ps)->ii - 1) / (ps)->ii)119 + 1 + ii - 1) / ii)
120/* The stage count of ps. */
121#define PS_STAGE_COUNT(ps) (((partial_schedule_ptr)(ps))->stage_count)
121122
122/* A single instruction in the partial schedule. */123/* A single instruction in the partial schedule. */
123struct ps_insn124struct ps_insn
@@ -155,6 +156,8 @@
155 int max_cycle;156 int max_cycle;
156157
157 ddg_ptr g; /* The DDG of the insns in the partial schedule. */158 ddg_ptr g; /* The DDG of the insns in the partial schedule. */
159
160 int stage_count; /* The stage count of the partial schedule. */
158};161};
159162
160/* We use this to record all the register replacements we do in163/* We use this to record all the register replacements we do in
@@ -195,7 +198,7 @@
195 rtx, rtx);198 rtx, rtx);
196static void duplicate_insns_of_cycles (partial_schedule_ptr,199static void duplicate_insns_of_cycles (partial_schedule_ptr,
197 int, int, int, rtx);200 int, int, int, rtx);
198201static int calculate_stage_count (partial_schedule_ptr ps);
199#define SCHED_ASAP(x) (((node_sched_params_ptr)(x)->aux.info)->asap)202#define SCHED_ASAP(x) (((node_sched_params_ptr)(x)->aux.info)->asap)
200#define SCHED_TIME(x) (((node_sched_params_ptr)(x)->aux.info)->time)203#define SCHED_TIME(x) (((node_sched_params_ptr)(x)->aux.info)->time)
201#define SCHED_FIRST_REG_MOVE(x) \204#define SCHED_FIRST_REG_MOVE(x) \
@@ -310,10 +313,10 @@
310 either a single (parallel) branch-on-count or a (non-parallel)313 either a single (parallel) branch-on-count or a (non-parallel)
311 branch immediately preceded by a single (decrement) insn. */314 branch immediately preceded by a single (decrement) insn. */
312 first_insn_not_to_check = (GET_CODE (PATTERN (tail)) == PARALLEL ? tail315 first_insn_not_to_check = (GET_CODE (PATTERN (tail)) == PARALLEL ? tail
313 : PREV_INSN (tail));316 : prev_nondebug_insn (tail));
314317
315 for (insn = head; insn != first_insn_not_to_check; insn = NEXT_INSN (insn))318 for (insn = head; insn != first_insn_not_to_check; insn = NEXT_INSN (insn))
316 if (reg_mentioned_p (reg, insn))319 if (reg_mentioned_p (reg, insn) && !DEBUG_INSN_P (insn))
317 {320 {
318 if (dump_file)321 if (dump_file)
319 {322 {
@@ -569,13 +572,12 @@
569 }572 }
570}573}
571574
572/* Bump the SCHED_TIMEs of all nodes to start from zero. Set the values575/* Bump the SCHED_TIMEs of all nodes by AMOUNT. Set the values of
573 of SCHED_ROW and SCHED_STAGE. */576 SCHED_ROW and SCHED_STAGE. */
574static void577static void
575normalize_sched_times (partial_schedule_ptr ps)578reset_sched_times (partial_schedule_ptr ps, int amount)
576{579{
577 int row;580 int row;
578 int amount = PS_MIN_CYCLE (ps);
579 int ii = ps->ii;581 int ii = ps->ii;
580 ps_insn_ptr crr_insn;582 ps_insn_ptr crr_insn;
581583
@@ -584,19 +586,43 @@
584 {586 {
585 ddg_node_ptr u = crr_insn->node;587 ddg_node_ptr u = crr_insn->node;
586 int normalized_time = SCHED_TIME (u) - amount;588 int normalized_time = SCHED_TIME (u) - amount;
589 int new_min_cycle = PS_MIN_CYCLE (ps) - amount;
590 int sc_until_cycle_zero, stage;
587591
588 if (dump_file)592 if (dump_file)
589 fprintf (dump_file, "crr_insn->node=%d, crr_insn->cycle=%d,\593 {
590 min_cycle=%d\n", crr_insn->node->cuid, SCHED_TIME594 /* Print the scheduling times after the rotation. */
591 (u), ps->min_cycle);595 fprintf (dump_file, "crr_insn->node=%d (insn id %d), "
596 "crr_insn->cycle=%d, min_cycle=%d", crr_insn->node->cuid,
597 INSN_UID (crr_insn->node->insn), SCHED_TIME (u),
598 normalized_time);
599 if (JUMP_P (crr_insn->node->insn))
600 fprintf (dump_file, " (branch)");
601 fprintf (dump_file, "\n");
602 }
603
592 gcc_assert (SCHED_TIME (u) >= ps->min_cycle);604 gcc_assert (SCHED_TIME (u) >= ps->min_cycle);
593 gcc_assert (SCHED_TIME (u) <= ps->max_cycle);605 gcc_assert (SCHED_TIME (u) <= ps->max_cycle);
594 SCHED_TIME (u) = normalized_time;606 SCHED_TIME (u) = normalized_time;
595 SCHED_ROW (u) = normalized_time % ii;607 SCHED_ROW (u) = SMODULO (normalized_time, ii);
596 SCHED_STAGE (u) = normalized_time / ii;608
609 /* The calculation of stage count is done adding the number
610 of stages before cycle zero and after cycle zero. */
611 sc_until_cycle_zero = CALC_STAGE_COUNT (-1, new_min_cycle, ii);
612
613 if (SCHED_TIME (u) < 0)
614 {
615 stage = CALC_STAGE_COUNT (-1, SCHED_TIME (u), ii);
616 SCHED_STAGE (u) = sc_until_cycle_zero - stage;
617 }
618 else
619 {
620 stage = CALC_STAGE_COUNT (SCHED_TIME (u), 0, ii);
621 SCHED_STAGE (u) = sc_until_cycle_zero + stage - 1;
622 }
597 }623 }
598}624}
599625
600/* Set SCHED_COLUMN of each node according to its position in PS. */626/* Set SCHED_COLUMN of each node according to its position in PS. */
601static void627static void
602set_columns_for_ps (partial_schedule_ptr ps)628set_columns_for_ps (partial_schedule_ptr ps)
@@ -646,9 +672,12 @@
646672
647 /* Do not duplicate any insn which refers to count_reg as it673 /* Do not duplicate any insn which refers to count_reg as it
648 belongs to the control part.674 belongs to the control part.
675 The closing branch is scheduled as well and thus should
676 be ignored.
649 TODO: This should be done by analyzing the control part of677 TODO: This should be done by analyzing the control part of
650 the loop. */678 the loop. */
651 if (reg_mentioned_p (count_reg, u_node->insn))679 if (reg_mentioned_p (count_reg, u_node->insn)
680 || JUMP_P (ps_ij->node->insn))
652 continue;681 continue;
653682
654 if (for_prolog)683 if (for_prolog)
@@ -1009,9 +1038,11 @@
1009 continue;1038 continue;
1010 }1039 }
10111040
1012 /* Don't handle BBs with calls or barriers, or !single_set insns,1041 /* Don't handle BBs with calls or barriers or auto-increment insns
1013 or auto-increment insns (to avoid creating invalid reg-moves1042 (to avoid creating invalid reg-moves for the auto-increment insns),
1014 for the auto-increment insns).1043 or !single_set with the exception of instructions that include
1044 count_reg---these instructions are part of the control part
1045 that do-loop recognizes.
1015 ??? Should handle auto-increment insns.1046 ??? Should handle auto-increment insns.
1016 ??? Should handle insns defining subregs. */1047 ??? Should handle insns defining subregs. */
1017 for (insn = head; insn != NEXT_INSN (tail); insn = NEXT_INSN (insn))1048 for (insn = head; insn != NEXT_INSN (tail); insn = NEXT_INSN (insn))
@@ -1021,7 +1052,8 @@
1021 if (CALL_P (insn)1052 if (CALL_P (insn)
1022 || BARRIER_P (insn)1053 || BARRIER_P (insn)
1023 || (NONDEBUG_INSN_P (insn) && !JUMP_P (insn)1054 || (NONDEBUG_INSN_P (insn) && !JUMP_P (insn)
1024 && !single_set (insn) && GET_CODE (PATTERN (insn)) != USE)1055 && !single_set (insn) && GET_CODE (PATTERN (insn)) != USE
1056 && !reg_mentioned_p (count_reg, insn))
1025 || (FIND_REG_INC_NOTE (insn, NULL_RTX) != 0)1057 || (FIND_REG_INC_NOTE (insn, NULL_RTX) != 0)
1026 || (INSN_P (insn) && (set = single_set (insn))1058 || (INSN_P (insn) && (set = single_set (insn))
1027 && GET_CODE (SET_DEST (set)) == SUBREG))1059 && GET_CODE (SET_DEST (set)) == SUBREG))
@@ -1049,7 +1081,11 @@
1049 continue;1081 continue;
1050 }1082 }
10511083
1052 if (! (g = create_ddg (bb, 0)))1084 /* Always schedule the closing branch with the rest of the
1085 instructions. The branch is rotated to be in row ii-1 at the
1086 end of the scheduling procedure to make sure it's the last
1087 instruction in the iteration. */
1088 if (! (g = create_ddg (bb, 1)))
1053 {1089 {
1054 if (dump_file)1090 if (dump_file)
1055 fprintf (dump_file, "SMS create_ddg failed\n");1091 fprintf (dump_file, "SMS create_ddg failed\n");
@@ -1157,14 +1193,17 @@
11571193
1158 ps = sms_schedule_by_order (g, mii, maxii, node_order);1194 ps = sms_schedule_by_order (g, mii, maxii, node_order);
11591195
1160 if (ps){1196 if (ps)
1161 stage_count = PS_STAGE_COUNT (ps);1197 {
1162 gcc_assert(stage_count >= 1);1198 stage_count = calculate_stage_count (ps);
1163 }1199 gcc_assert(stage_count >= 1);
1200 PS_STAGE_COUNT(ps) = stage_count;
1201 }
11641202
1165 /* Stage count of 1 means that there is no interleaving between1203 /* The default value of PARAM_SMS_MIN_SC is 2 as stage count of
1166 iterations, let the scheduling passes do the job. */1204 1 means that there is no interleaving between iterations thus
1167 if (stage_count <= 11205 we let the scheduling passes do the job in this case. */
1206 if (stage_count < (unsigned) PARAM_VALUE (PARAM_SMS_MIN_SC)
1168 || (count_init && (loop_count <= stage_count))1207 || (count_init && (loop_count <= stage_count))
1169 || (flag_branch_probabilities && (trip_count <= stage_count)))1208 || (flag_branch_probabilities && (trip_count <= stage_count)))
1170 {1209 {
@@ -1182,32 +1221,24 @@
1182 else1221 else
1183 {1222 {
1184 struct undo_replace_buff_elem *reg_move_replaces;1223 struct undo_replace_buff_elem *reg_move_replaces;
11851224 int amount = SCHED_TIME (g->closing_branch) + 1;
1186 if (dump_file)1225
1187 {1226 /* Set the stage boundaries. The closing_branch was scheduled
1227 and should appear in the last (ii-1) row. */
1228 reset_sched_times (ps, amount);
1229 rotate_partial_schedule (ps, amount);
1230 set_columns_for_ps (ps);
1231
1232 canon_loop (loop);
1233
1234 if (dump_file)
1235 {
1188 fprintf (dump_file,1236 fprintf (dump_file,
1189 "SMS succeeded %d %d (with ii, sc)\n", ps->ii,1237 "SMS succeeded %d %d (with ii, sc)\n", ps->ii,
1190 stage_count);1238 stage_count);
1191 print_partial_schedule (ps, dump_file);1239 print_partial_schedule (ps, dump_file);
1192 fprintf (dump_file,
1193 "SMS Branch (%d) will later be scheduled at cycle %d.\n",
1194 g->closing_branch->cuid, PS_MIN_CYCLE (ps) - 1);
1195 }1240 }
11961241
1197 /* Set the stage boundaries. If the DDG is built with closing_branch_deps,
1198 the closing_branch was scheduled and should appear in the last (ii-1)
1199 row. Otherwise, we are free to schedule the branch, and we let nodes
1200 that were scheduled at the first PS_MIN_CYCLE cycle appear in the first
1201 row; this should reduce stage_count to minimum.
1202 TODO: Revisit the issue of scheduling the insns of the
1203 control part relative to the branch when the control part
1204 has more than one insn. */
1205 normalize_sched_times (ps);
1206 rotate_partial_schedule (ps, PS_MIN_CYCLE (ps));
1207 set_columns_for_ps (ps);
1208
1209 canon_loop (loop);
1210
1211 /* case the BCT count is not known , Do loop-versioning */1242 /* case the BCT count is not known , Do loop-versioning */
1212 if (count_reg && ! count_init)1243 if (count_reg && ! count_init)
1213 {1244 {
@@ -1760,12 +1791,6 @@
1760 continue;1791 continue;
1761 }1792 }
17621793
1763 if (JUMP_P (insn)) /* Closing branch handled later. */
1764 {
1765 RESET_BIT (tobe_scheduled, u);
1766 continue;
1767 }
1768
1769 if (TEST_BIT (sched_nodes, u))1794 if (TEST_BIT (sched_nodes, u))
1770 continue;1795 continue;
17711796
@@ -1893,8 +1918,8 @@
1893 if (dump_file)1918 if (dump_file)
1894 fprintf (dump_file, "split_row=%d\n", split_row);1919 fprintf (dump_file, "split_row=%d\n", split_row);
18951920
1896 normalize_sched_times (ps);1921 reset_sched_times (ps, PS_MIN_CYCLE (ps));
1897 rotate_partial_schedule (ps, ps->min_cycle);1922 rotate_partial_schedule (ps, PS_MIN_CYCLE (ps));
18981923
1899 rows_new = (ps_insn_ptr *) xcalloc (new_ii, sizeof (ps_insn_ptr));1924 rows_new = (ps_insn_ptr *) xcalloc (new_ii, sizeof (ps_insn_ptr));
1900 for (row = 0; row < split_row; row++)1925 for (row = 0; row < split_row; row++)
@@ -2571,6 +2596,7 @@
2571 ps_insn_ptr next_ps_i;2596 ps_insn_ptr next_ps_i;
2572 ps_insn_ptr first_must_follow = NULL;2597 ps_insn_ptr first_must_follow = NULL;
2573 ps_insn_ptr last_must_precede = NULL;2598 ps_insn_ptr last_must_precede = NULL;
2599 ps_insn_ptr last_in_row = NULL;
2574 int row;2600 int row;
25752601
2576 if (! ps_i)2602 if (! ps_i)
@@ -2597,8 +2623,37 @@
2597 else2623 else
2598 last_must_precede = next_ps_i;2624 last_must_precede = next_ps_i;
2599 }2625 }
2626 /* The closing branch must be the last in the row. */
2627 if (must_precede
2628 && TEST_BIT (must_precede, next_ps_i->node->cuid)
2629 && JUMP_P (next_ps_i->node->insn))
2630 return false;
2631
2632 last_in_row = next_ps_i;
2600 }2633 }
26012634
2635 /* The closing branch is scheduled as well. Make sure there is no
2636 dependent instruction after it as the branch should be the last
2637 instruction in the row. */
2638 if (JUMP_P (ps_i->node->insn))
2639 {
2640 if (first_must_follow)
2641 return false;
2642 if (last_in_row)
2643 {
2644 /* Make the branch the last in the row. New instructions
2645 will be inserted at the beginning of the row or after the
2646 last must_precede instruction thus the branch is guaranteed
2647 to remain the last instruction in the row. */
2648 last_in_row->next_in_row = ps_i;
2649 ps_i->prev_in_row = last_in_row;
2650 ps_i->next_in_row = NULL;
2651 }
2652 else
2653 ps->rows[row] = ps_i;
2654 return true;
2655 }
2656
2602 /* Now insert the node after INSERT_AFTER_PSI. */2657 /* Now insert the node after INSERT_AFTER_PSI. */
26032658
2604 if (! last_must_precede)2659 if (! last_must_precede)
@@ -2820,6 +2875,24 @@
2820 return ps_i;2875 return ps_i;
2821}2876}
28222877
2878/* Calculate the stage count of the partial schedule PS. The calculation
2879 takes into account the rotation to bring the closing branch to row
2880 ii-1. */
2881int
2882calculate_stage_count (partial_schedule_ptr ps)
2883{
2884 int rotation_amount = (SCHED_TIME (ps->g->closing_branch)) + 1;
2885 int new_min_cycle = PS_MIN_CYCLE (ps) - rotation_amount;
2886 int new_max_cycle = PS_MAX_CYCLE (ps) - rotation_amount;
2887 int stage_count = CALC_STAGE_COUNT (-1, new_min_cycle, ps->ii);
2888
2889 /* The calculation of stage count is done adding the number of stages
2890 before cycle zero and after cycle zero. */
2891 stage_count += CALC_STAGE_COUNT (new_max_cycle, 0, ps->ii);
2892
2893 return stage_count;
2894}
2895
2823/* Rotate the rows of PS such that insns scheduled at time2896/* Rotate the rows of PS such that insns scheduled at time
2824 START_CYCLE will appear in row 0. Updates max/min_cycles. */2897 START_CYCLE will appear in row 0. Updates max/min_cycles. */
2825void2898void
28262899
=== modified file 'gcc/params.def'
--- gcc/params.def 2011-04-18 11:31:29 +0000
+++ gcc/params.def 2011-05-11 10:02:07 +0000
@@ -344,6 +344,11 @@
344 "sms-max-ii-factor",344 "sms-max-ii-factor",
345 "A factor for tuning the upper bound that swing modulo scheduler uses for scheduling a loop",345 "A factor for tuning the upper bound that swing modulo scheduler uses for scheduling a loop",
346 100, 0, 0)346 100, 0, 0)
347/* The minimum value of stage count that swing modulo scheduler will generate. */
348DEFPARAM(PARAM_SMS_MIN_SC,
349 "sms-min-sc",
350 "The minimum value of stage count that swing modulo scheduler will generate.",
351 2, 1, 1)
347DEFPARAM(PARAM_SMS_DFA_HISTORY,352DEFPARAM(PARAM_SMS_DFA_HISTORY,
348 "sms-dfa-history",353 "sms-dfa-history",
349 "The number of cycles the swing modulo scheduler considers when checking conflicts using DFA",354 "The number of cycles the swing modulo scheduler considers when checking conflicts using DFA",

Subscribers

People subscribed via source and target branches