Merge lp:~eres/gcc-linaro/SMS_doloop_for_ARM into lp:gcc-linaro/4.5

Proposed by Revital Eres
Status: Merged
Approved by: Andrew Stubbs
Approved revision: no longer in the source branch.
Merged at revision: 99495
Proposed branch: lp:~eres/gcc-linaro/SMS_doloop_for_ARM
Merge into: lp:gcc-linaro/4.5
Diff against target: 791 lines (+415/-53)
7 files modified
ChangeLog.linaro (+32/-0)
gcc/config/arm/arm.md (+1/-1)
gcc/config/arm/thumb2.md (+54/-1)
gcc/ddg.c (+73/-2)
gcc/ddg.h (+1/-1)
gcc/loop-doloop.c (+67/-7)
gcc/modulo-sched.c (+187/-41)
To merge this branch: bzr merge lp:~eres/gcc-linaro/SMS_doloop_for_ARM
Reviewer Review Type Date Requested Status
Andrew Stubbs (community) Approve
Linaro Toolchain Builder Approve
Review via email: mp+54657@code.launchpad.net

Description of the change

Implement doloop pattern for ARM and support it in SMS.
The change consists of two patches that sent upstream:
1) Doloop pattern for ARM:
http://gcc.gnu.org/ml/gcc-patches/2011-01/msg01718.html (approved for trunk)
2) Support closing_branch_deps:
 http://gcc.gnu.org/ml/gcc-patches/2011-03/msg00350.html (NOT approved yet)

To post a comment you must log in.
Revision history for this message
Linaro Toolchain Builder (cbuild) wrote :

cbuild has taken a snapshot of this branch at r99490 and queued it for
build.

The snapshot is available at:
 http://ex.seabright.co.nz/snapshots/

and named something like gcc-linaro-4.5+bzr99490~eres~SMS_doloop_for_ARM.*

You can track the build queue at:
 http://ex.seabright.co.nz/helpers/scheduler

cbuild-snapshot: gcc-linaro-4.5+bzr99490~eres~SMS_doloop_for_ARM
cbuild-state: check

Revision history for this message
Linaro Toolchain Builder (cbuild) wrote :

cbuild successfully built this on i686-lucid-cbuild93-scorpius-i686r1.

The build results are available at:
 http://ex.seabright.co.nz/build/gcc-linaro-4.5+bzr99490~eres~SMS_doloop_for_ARM/logs/i686-lucid-cbuild93-scorpius-i686r1

cbuild-checked: i686-lucid-cbuild93-scorpius-i686r1

review: Approve
Revision history for this message
Linaro Toolchain Builder (cbuild) wrote :

cbuild successfully built this on x86_64-maverick-cbuild93-crucis-x86_64r1.

The build results are available at:
 http://ex.seabright.co.nz/build/gcc-linaro-4.5+bzr99490~eres~SMS_doloop_for_ARM/logs/x86_64-maverick-cbuild93-crucis-x86_64r1

cbuild-checked: x86_64-maverick-cbuild93-crucis-x86_64r1

review: Approve
Revision history for this message
Linaro Toolchain Builder (cbuild) wrote :

cbuild successfully built this on armv7l-maverick-cbuild93-ursa2-cortexa9r1.

The build results are available at:
 http://ex.seabright.co.nz/build/gcc-linaro-4.5+bzr99490~eres~SMS_doloop_for_ARM/logs/armv7l-maverick-cbuild93-ursa2-cortexa9r1

cbuild-checked: armv7l-maverick-cbuild93-ursa2-cortexa9r1

review: Approve
Revision history for this message
Linaro Toolchain Builder (cbuild) wrote :

cbuild successfully built this on armv7l-maverick-cbuild93-ursa1-cortexa8r1.

The build results are available at:
 http://ex.seabright.co.nz/build/gcc-linaro-4.5+bzr99490~eres~SMS_doloop_for_ARM/logs/armv7l-maverick-cbuild93-ursa1-cortexa8r1

cbuild-checked: armv7l-maverick-cbuild93-ursa1-cortexa8r1

Revision history for this message
Andrew Stubbs (ams-codesourcery) wrote :

The test results look ok, and there's nothing that jumps out at me from the patch, so yes, I approve of this.

Please go ahead and merge it to lp:gcc-linaro.

review: Approve
Revision history for this message
Ira Rosen (irar) wrote :

I am out of the office until 17/04/2011.

Note: This is an automated response to your message "[Merge]
lp:~eres/gcc-linaro/SMS_doloop_for_ARM into lp:gcc-linaro" sent on
14/4/2011 14:23:25.

This is the only notification you will receive while this person is away.

Revision history for this message
Michael Hope (michaelh1) wrote :

Hi Revital. I'm afraid we've missed the window for next weeks release. Could you merge this middle of next week when tip is open again?

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'ChangeLog.linaro'
2--- ChangeLog.linaro 2011-03-11 17:36:50 +0000
3+++ ChangeLog.linaro 2011-03-24 08:05:50 +0000
4@@ -1,3 +1,35 @@
5+2011-03-24 Revital Eres <revital.eres@linaro.org>
6+
7+ gcc/
8+ * loop-doloop.c (doloop_condition_get): Support new form of
9+ doloop pattern and use prev_nondebug_insn instead of PREV_INSN.
10+ * config/arm/thumb2.md (*thumb2_addsi3_compare0): Remove "*".
11+ (doloop_end): New.
12+ * config/arm/arm.md (*addsi3_compare0): Remove "*".
13+ * ddg.c (check_closing_branch_deps, get_node_of_insn_uid):
14+ New functions.
15+ (create_ddg): Pass sbitmap containing do-loop related
16+ instructions instead of closing_branch_deps parameter and call
17+ check_closing_branch_deps function.
18+ * ddg.h (create_ddg): Adjust the function declaration.
19+ * modulo-sched.c (PS_STAGE_COUNT): Rename to CALC_STAGE_COUNT
20+ and redefine.
21+ (doloop_register_get): Handle NONDEBUG_INSN_P.
22+ (stage_count): New field in struct partial_schedule.
23+ (mark_doloop_insns, calculate_stage_count): New functions.
24+ (normalize_sched_times): Rename to reset_sched_times and handle
25+ incrementing the sched time of the nodes by a constant value
26+ passed as parameter.
27+ (duplicate_insns_of_cycles): Skip closing branch.
28+ (sms_schedule_by_order): Schedule closing branch when
29+ closing_branch_deps is true.
30+ (ps_insn_find_column): Handle closing branch.
31+ (sms_schedule): Call reset_sched_times and handle case where
32+ do-loop pattern is not decoupled from the other loop instructions.
33+ Support new form of doloop pattern.
34+ (ps_insert_empty_row): Update calls to normalize_sched_times
35+ and rotate_partial_schedule functions.
36+
37 2011-03-10 Ramana Radhakrishnan <ramana.radhakrishnan@linaro.org>
38
39 LP:730440
40
41=== modified file 'gcc/config/arm/arm.md'
42--- gcc/config/arm/arm.md 2011-03-11 14:26:34 +0000
43+++ gcc/config/arm/arm.md 2011-03-24 08:05:50 +0000
44@@ -734,7 +734,7 @@
45 ""
46 )
47
48-(define_insn "*addsi3_compare0"
49+(define_insn "addsi3_compare0"
50 [(set (reg:CC_NOOV CC_REGNUM)
51 (compare:CC_NOOV
52 (plus:SI (match_operand:SI 1 "s_register_operand" "r, r")
53
54=== modified file 'gcc/config/arm/thumb2.md'
55--- gcc/config/arm/thumb2.md 2011-02-08 10:51:58 +0000
56+++ gcc/config/arm/thumb2.md 2011-03-24 08:05:50 +0000
57@@ -1194,7 +1194,7 @@
58 (set_attr "length" "2")]
59 )
60
61-(define_insn "*thumb2_addsi3_compare0"
62+(define_insn "thumb2_addsi3_compare0"
63 [(set (reg:CC_NOOV CC_REGNUM)
64 (compare:CC_NOOV
65 (plus:SI (match_operand:SI 1 "s_register_operand" "l, 0, r")
66@@ -1445,3 +1445,56 @@
67 [(set_attr "length" "4,4,16")
68 (set_attr "predicable" "yes")]
69 )
70+
71+
72+;; Define the subtract-one-and-jump insns so loop.c
73+;; knows what to generate.
74+(define_expand "doloop_end"
75+ [(use (match_operand 0 "" "")) ; loop pseudo
76+ (use (match_operand 1 "" "")) ; iterations; zero if unknown
77+ (use (match_operand 2 "" "")) ; max iterations
78+ (use (match_operand 3 "" "")) ; loop level
79+ (use (match_operand 4 "" ""))] ; label
80+ "TARGET_32BIT"
81+ "
82+ {
83+ /* Currently SMS relies on the do-loop pattern to recognize loops
84+ where (1) the control part consists of all insns defining and/or
85+ using a certain 'count' register and (2) the loop count can be
86+ adjusted by modifying this register prior to the loop.
87+ ??? The possible introduction of a new block to initialize the
88+ new IV can potentially affect branch optimizations. */
89+ if (optimize > 0 && flag_modulo_sched)
90+ {
91+ rtx s0;
92+ rtx bcomp;
93+ rtx loc_ref;
94+ rtx cc_reg;
95+ rtx insn;
96+ rtx cmp;
97+
98+ /* Only use this on innermost loops. */
99+ if (INTVAL (operands[3]) > 1)
100+ FAIL;
101+
102+ if (GET_MODE (operands[0]) != SImode)
103+ FAIL;
104+
105+ s0 = operands [0];
106+ if (TARGET_THUMB2)
107+ insn = emit_insn (gen_thumb2_addsi3_compare0 (s0, s0, GEN_INT (-1)));
108+ else
109+ insn = emit_insn (gen_addsi3_compare0 (s0, s0, GEN_INT (-1)));
110+
111+ cmp = XVECEXP (PATTERN (insn), 0, 0);
112+ cc_reg = SET_DEST (cmp);
113+ bcomp = gen_rtx_NE (VOIDmode, cc_reg, const0_rtx);
114+ loc_ref = gen_rtx_LABEL_REF (VOIDmode, operands [4]);
115+ emit_jump_insn (gen_rtx_SET (VOIDmode, pc_rtx,
116+ gen_rtx_IF_THEN_ELSE (VOIDmode, bcomp,
117+ loc_ref, pc_rtx)));
118+ DONE;
119+ }else
120+ FAIL;
121+ }")
122+
123
124=== modified file 'gcc/ddg.c'
125--- gcc/ddg.c 2010-07-19 08:58:53 +0000
126+++ gcc/ddg.c 2011-03-24 08:05:50 +0000
127@@ -60,6 +60,8 @@
128 static ddg_edge_ptr create_ddg_edge (ddg_node_ptr, ddg_node_ptr, dep_type,
129 dep_data_type, int, int);
130 static void add_edge_to_ddg (ddg_ptr g, ddg_edge_ptr);
131+static ddg_node_ptr get_node_of_insn_uid (ddg_ptr, int);
132+
133
134
135 /* Auxiliary variable for mem_read_insn_p/mem_write_insn_p. */
136 static bool mem_ref_p;
137@@ -450,12 +452,65 @@
138 sched_free_deps (head, tail, false);
139 }
140
141+/* Given DOLOOP_INSNS which holds the instructions that
142+ belong to the do-loop part; mark closing_branch_deps field in ddg G
143+ as TRUE if the do-loop part's instructions are dependent on the other
144+ loop instructions. Otherwise mark it as FALSE. */
145+static void
146+check_closing_branch_deps (ddg_ptr g, sbitmap doloop_insns)
147+{
148+ sbitmap_iterator sbi;
149+ unsigned int u = 0;
150+
151+ EXECUTE_IF_SET_IN_SBITMAP (doloop_insns, 0, u, sbi)
152+ {
153+ ddg_edge_ptr e;
154+ ddg_node_ptr u_node = get_node_of_insn_uid (g, u);
155+
156+ gcc_assert (u_node);
157+
158+ for (e = u_node->in; e != 0; e = e->next_in)
159+ {
160+ ddg_node_ptr v_node = e->src;
161+
162+ if (((unsigned int) INSN_UID (v_node->insn) == u)
163+ || DEBUG_INSN_P (v_node->insn))
164+ continue;
165+
166+ /* Ignore dependencies between memory writes and the
167+ jump. */
168+ if (JUMP_P (u_node->insn)
169+ && e->type == OUTPUT_DEP
170+ && mem_write_insn_p (v_node->insn))
171+ continue;
172+ if (!TEST_BIT (doloop_insns, INSN_UID (v_node->insn)))
173+ {
174+ g->closing_branch_deps = 1;
175+ return;
176+ }
177+ }
178+ for (e = u_node->out; e != 0; e = e->next_out)
179+ {
180+ ddg_node_ptr v_node = e->dest;
181+
182+ if (((unsigned int) INSN_UID (v_node->insn) == u)
183+ || DEBUG_INSN_P (v_node->insn))
184+ continue;
185+ if (!TEST_BIT (doloop_insns, INSN_UID (v_node->insn)))
186+ {
187+ g->closing_branch_deps = 1;
188+ return;
189+ }
190+ }
191+ }
192+ g->closing_branch_deps = 0;
193+}
194
195 /* Given a basic block, create its DDG and return a pointer to a variable
196 of ddg type that represents it.
197 Initialize the ddg structure fields to the appropriate values. */
198 ddg_ptr
199-create_ddg (basic_block bb, int closing_branch_deps)
200+create_ddg (basic_block bb, sbitmap doloop_insns)
201 {
202 ddg_ptr g;
203 rtx insn, first_note;
204@@ -465,7 +520,6 @@
205 g = (ddg_ptr) xcalloc (1, sizeof (struct ddg));
206
207 g->bb = bb;
208- g->closing_branch_deps = closing_branch_deps;
209
210 /* Count the number of insns in the BB. */
211 for (insn = BB_HEAD (bb); insn != NEXT_INSN (BB_END (bb));
212@@ -538,6 +592,11 @@
213 /* Build the data dependency graph. */
214 build_intra_loop_deps (g);
215 build_inter_loop_deps (g);
216+
217+ /* Check whether the do-loop part is decoupled from the other loop
218+ instructions. */
219+ check_closing_branch_deps (g, doloop_insns);
220+
221 return g;
222 }
223
224@@ -831,6 +890,18 @@
225 return NULL;
226 }
227
228+/* Given the uid of an instruction UID return the node that represents it. */
229+static ddg_node_ptr
230+get_node_of_insn_uid (ddg_ptr g, int uid)
231+{
232+ int i;
233+
234+ for (i = 0; i < g->num_nodes; i++)
235+ if (uid == INSN_UID (g->nodes[i].insn))
236+ return &g->nodes[i];
237+ return NULL;
238+}
239+
240 /* Given a set OPS of nodes in the DDG, find the set of their successors
241 which are not in OPS, and set their bits in SUCC. Bits corresponding to
242 OPS are cleared from SUCC. Leaves the other bits in SUCC unchanged. */
243
244=== modified file 'gcc/ddg.h'
245--- gcc/ddg.h 2009-11-25 10:55:54 +0000
246+++ gcc/ddg.h 2011-03-24 08:05:50 +0000
247@@ -167,7 +167,7 @@
248 };
249
250
251
252-ddg_ptr create_ddg (basic_block, int closing_branch_deps);
253+ddg_ptr create_ddg (basic_block, sbitmap);
254 void free_ddg (ddg_ptr);
255
256 void print_ddg (FILE *, ddg_ptr);
257
258=== modified file 'gcc/loop-doloop.c'
259--- gcc/loop-doloop.c 2010-07-19 08:58:53 +0000
260+++ gcc/loop-doloop.c 2011-03-24 08:05:50 +0000
261@@ -78,6 +78,8 @@
262 rtx inc_src;
263 rtx condition;
264 rtx pattern;
265+ rtx cc_reg = NULL_RTX;
266+ rtx reg_orig = NULL_RTX;
267
268 /* The canonical doloop pattern we expect has one of the following
269 forms:
270@@ -96,7 +98,16 @@
271 2) (set (reg) (plus (reg) (const_int -1))
272 (set (pc) (if_then_else (reg != 0)
273 (label_ref (label))
274- (pc))). */
275+ (pc))).
276+
277+ Some targets (ARM) do the comparison before the branch, as in the
278+ following form:
279+
280+ 3) (parallel [(set (cc) (compare ((plus (reg) (const_int -1), 0)))
281+ (set (reg) (plus (reg) (const_int -1)))])
282+ (set (pc) (if_then_else (cc == NE)
283+ (label_ref (label))
284+ (pc))) */
285
286 pattern = PATTERN (doloop_pat);
287
288@@ -104,19 +115,47 @@
289 {
290 rtx cond;
291 rtx prev_insn = prev_nondebug_insn (doloop_pat);
292+ rtx cmp_arg1, cmp_arg2;
293+ rtx cmp_orig;
294
295- /* We expect the decrement to immediately precede the branch. */
296+ /* In case the pattern is not PARALLEL we expect two forms
297+ of doloop which are cases 2) and 3) above: in case 2) the
298+ decrement immediately precedes the branch, while in case 3)
299+ the compare and decrement instructions immediately precede
300+ the branch. */
301
302 if (prev_insn == NULL_RTX || !INSN_P (prev_insn))
303 return 0;
304
305 cmp = pattern;
306- inc = PATTERN (PREV_INSN (doloop_pat));
307+ if (GET_CODE (PATTERN (prev_insn)) == PARALLEL)
308+ {
309+ /* The third case: the compare and decrement instructions
310+ immediately precede the branch. */
311+ cmp_orig = XVECEXP (PATTERN (prev_insn), 0, 0);
312+ if (GET_CODE (cmp_orig) != SET)
313+ return 0;
314+ if (GET_CODE (SET_SRC (cmp_orig)) != COMPARE)
315+ return 0;
316+ cmp_arg1 = XEXP (SET_SRC (cmp_orig), 0);
317+ cmp_arg2 = XEXP (SET_SRC (cmp_orig), 1);
318+ if (cmp_arg2 != const0_rtx
319+ || GET_CODE (cmp_arg1) != PLUS)
320+ return 0;
321+ reg_orig = XEXP (cmp_arg1, 0);
322+ if (XEXP (cmp_arg1, 1) != GEN_INT (-1)
323+ || !REG_P (reg_orig))
324+ return 0;
325+ cc_reg = SET_DEST (cmp_orig);
326+
327+ inc = XVECEXP (PATTERN (prev_insn), 0, 1);
328+ }
329+ else
330+ inc = PATTERN (prev_insn);
331 /* We expect the condition to be of the form (reg != 0) */
332 cond = XEXP (SET_SRC (cmp), 0);
333 if (GET_CODE (cond) != NE || XEXP (cond, 1) != const0_rtx)
334 return 0;
335-
336 }
337 else
338 {
339@@ -162,11 +201,15 @@
340 return 0;
341
342 if ((XEXP (condition, 0) == reg)
343+ /* For the third case: */
344+ || ((cc_reg != NULL_RTX)
345+ && (XEXP (condition, 0) == cc_reg)
346+ && (reg_orig == reg))
347 || (GET_CODE (XEXP (condition, 0)) == PLUS
348- && XEXP (XEXP (condition, 0), 0) == reg))
349+ && XEXP (XEXP (condition, 0), 0) == reg))
350 {
351 if (GET_CODE (pattern) != PARALLEL)
352- /* The second form we expect:
353+ /* For the second form we expect:
354
355 (set (reg) (plus (reg) (const_int -1))
356 (set (pc) (if_then_else (reg != 0)
357@@ -181,7 +224,24 @@
358 (set (reg) (plus (reg) (const_int -1)))
359 (additional clobbers and uses)])
360
361- So we return that form instead.
362+ For the third form we expect:
363+
364+ (parallel [(set (cc) (compare ((plus (reg) (const_int -1)), 0))
365+ (set (reg) (plus (reg) (const_int -1)))])
366+ (set (pc) (if_then_else (cc == NE)
367+ (label_ref (label))
368+ (pc)))
369+
370+ which is equivalent to the following:
371+
372+ (parallel [(set (cc) (compare (reg, 1))
373+ (set (reg) (plus (reg) (const_int -1)))
374+ (set (pc) (if_then_else (NE == cc)
375+ (label_ref (label))
376+ (pc))))])
377+
378+ So we return the second form instead for the two cases.
379+
380 */
381 condition = gen_rtx_fmt_ee (NE, VOIDmode, inc_src, const1_rtx);
382
383
384=== modified file 'gcc/modulo-sched.c'
385--- gcc/modulo-sched.c 2009-11-25 10:55:54 +0000
386+++ gcc/modulo-sched.c 2011-03-24 08:05:50 +0000
387@@ -116,8 +116,10 @@
388
389 /* The number of different iterations the nodes in ps span, assuming
390 the stage boundaries are placed efficiently. */
391-#define PS_STAGE_COUNT(ps) ((PS_MAX_CYCLE (ps) - PS_MIN_CYCLE (ps) \
392- + 1 + (ps)->ii - 1) / (ps)->ii)
393+#define CALC_STAGE_COUNT(min_cycle,max_cycle,ii) ((max_cycle - min_cycle \
394+ + 1 + ii - 1) / ii)
395+/* The stage count of ps. */
396+#define PS_STAGE_COUNT(ps) (((partial_schedule_ptr)(ps))->stage_count)
397
398 /* A single instruction in the partial schedule. */
399 struct ps_insn
400@@ -155,6 +157,8 @@
401 int max_cycle;
402
403 ddg_ptr g; /* The DDG of the insns in the partial schedule. */
404+
405+ int stage_count; /* The stage count of the partial schedule. */
406 };
407
408 /* We use this to record all the register replacements we do in
409@@ -195,6 +199,7 @@
410 rtx, rtx);
411 static void duplicate_insns_of_cycles (partial_schedule_ptr,
412 int, int, int, rtx);
413+static int calculate_stage_count (partial_schedule_ptr ps);
414
415 #define SCHED_ASAP(x) (((node_sched_params_ptr)(x)->aux.info)->asap)
416 #define SCHED_TIME(x) (((node_sched_params_ptr)(x)->aux.info)->time)
417@@ -310,10 +315,10 @@
418 either a single (parallel) branch-on-count or a (non-parallel)
419 branch immediately preceded by a single (decrement) insn. */
420 first_insn_not_to_check = (GET_CODE (PATTERN (tail)) == PARALLEL ? tail
421- : PREV_INSN (tail));
422+ : prev_nondebug_insn (tail));
423
424 for (insn = head; insn != first_insn_not_to_check; insn = NEXT_INSN (insn))
425- if (reg_mentioned_p (reg, insn))
426+ if (reg_mentioned_p (reg, insn) && NONDEBUG_INSN_P (insn))
427 {
428 if (dump_file)
429 {
430@@ -332,6 +337,24 @@
431 #endif
432 }
433
434+/* Mark in DOLOOP_INSNS the instructions that belong to the do-loop part.
435+ Use TAIL to recognize that part. */
436+static void
437+mark_doloop_insns (sbitmap doloop_insns, rtx tail)
438+{
439+ rtx first_insn_not_to_check, insn;
440+
441+ /* This is the first instruction which belongs the doloop part. */
442+ first_insn_not_to_check = (GET_CODE (PATTERN (tail)) == PARALLEL ? tail
443+ : prev_nondebug_insn (tail));
444+
445+ sbitmap_zero (doloop_insns);
446+ for (insn = first_insn_not_to_check; insn != NEXT_INSN (tail);
447+ insn = NEXT_INSN (insn))
448+ if (NONDEBUG_INSN_P (insn))
449+ SET_BIT (doloop_insns, INSN_UID (insn));
450+}
451+
452 /* Check if COUNT_REG is set to a constant in the PRE_HEADER block, so
453 that the number of iterations is a compile-time constant. If so,
454 return the rtx that sets COUNT_REG to a constant, and set COUNT to
455@@ -569,13 +592,12 @@
456 }
457 }
458
459-/* Bump the SCHED_TIMEs of all nodes to start from zero. Set the values
460- of SCHED_ROW and SCHED_STAGE. */
461+/* Bump the SCHED_TIMEs of all nodes by AMOUNT. Set the values of
462+ SCHED_ROW and SCHED_STAGE. */
463 static void
464-normalize_sched_times (partial_schedule_ptr ps)
465+reset_sched_times (partial_schedule_ptr ps, int amount)
466 {
467 int row;
468- int amount = PS_MIN_CYCLE (ps);
469 int ii = ps->ii;
470 ps_insn_ptr crr_insn;
471
472@@ -584,6 +606,10 @@
473 {
474 ddg_node_ptr u = crr_insn->node;
475 int normalized_time = SCHED_TIME (u) - amount;
476+ int new_min_cycle = PS_MIN_CYCLE (ps) - amount;
477+ /* The first cycle in row zero after the rotation. */
478+ int new_first_cycle_in_row_zero =
479+ new_min_cycle + ii - SMODULO (new_min_cycle, ii);
480
481 if (dump_file)
482 fprintf (dump_file, "crr_insn->node=%d, crr_insn->cycle=%d,\
483@@ -592,8 +618,30 @@
484 gcc_assert (SCHED_TIME (u) >= ps->min_cycle);
485 gcc_assert (SCHED_TIME (u) <= ps->max_cycle);
486 SCHED_TIME (u) = normalized_time;
487- SCHED_ROW (u) = normalized_time % ii;
488- SCHED_STAGE (u) = normalized_time / ii;
489+ crr_insn->cycle = normalized_time;
490+ SCHED_ROW (u) = SMODULO (normalized_time, ii);
491+
492+ /* If min_cycle is in row zero after the rotation then
493+ the stage count can be calculated by dividing the cycle
494+ with ii. Otherwise, the calculation is done by dividing the
495+ SMSed kernel into two intervals:
496+
497+ 1) min_cycle <= interval 0 < first_cycle_in_row_zero
498+ 2) first_cycle_in_row_zero <= interval 1 < max_cycle
499+
500+ Cycles in interval 0 are in stage 0. The stage of cycles
501+ in interval 1 should be added by 1 to take interval 0 into
502+ account. */
503+ if (SMODULO (new_min_cycle, ii) == 0)
504+ SCHED_STAGE (u) = normalized_time / ii;
505+ else
506+ {
507+ if (crr_insn->cycle < new_first_cycle_in_row_zero)
508+ SCHED_STAGE (u) = 0;
509+ else
510+ SCHED_STAGE (u) =
511+ ((SCHED_TIME (u) - new_first_cycle_in_row_zero) / ii) + 1;
512+ }
513 }
514 }
515
516@@ -646,9 +694,12 @@
517
518 /* Do not duplicate any insn which refers to count_reg as it
519 belongs to the control part.
520+ If closing_branch_deps is true the closing branch is scheduled
521+ as well and thus should be ignored.
522 TODO: This should be done by analyzing the control part of
523 the loop. */
524- if (reg_mentioned_p (count_reg, u_node->insn))
525+ if (reg_mentioned_p (count_reg, u_node->insn)
526+ || JUMP_P (ps_ij->node->insn))
527 continue;
528
529 if (for_prolog)
530@@ -894,7 +945,8 @@
531 basic_block condition_bb = NULL;
532 edge latch_edge;
533 gcov_type trip_count = 0;
534-
535+ sbitmap doloop_insns;
536+
537 loop_optimizer_init (LOOPS_HAVE_PREHEADERS
538 | LOOPS_HAVE_RECORDED_EXITS);
539 if (number_of_loops () <= 1)
540@@ -919,6 +971,7 @@
541 setup_sched_infos ();
542 haifa_sched_init ();
543
544+ doloop_insns = sbitmap_alloc (get_max_uid () + 1);
545 /* Allocate memory to hold the DDG array one entry for each loop.
546 We use loop->num as index into this array. */
547 g_arr = XCNEWVEC (ddg_ptr, number_of_loops ());
548@@ -1009,9 +1062,11 @@
549 continue;
550 }
551
552- /* Don't handle BBs with calls or barriers, or !single_set insns,
553- or auto-increment insns (to avoid creating invalid reg-moves
554- for the auto-increment insns).
555+ /* Don't handle BBs with calls or barriers or auto-increment insns
556+ (to avoid creating invalid reg-moves for the auto-increment insns),
557+ or !single_set with the exception of instructions that include
558+ count_reg---these instructions are part of the control part
559+ that do-loop recognizes.
560 ??? Should handle auto-increment insns.
561 ??? Should handle insns defining subregs. */
562 for (insn = head; insn != NEXT_INSN (tail); insn = NEXT_INSN (insn))
563@@ -1021,7 +1076,8 @@
564 if (CALL_P (insn)
565 || BARRIER_P (insn)
566 || (NONDEBUG_INSN_P (insn) && !JUMP_P (insn)
567- && !single_set (insn) && GET_CODE (PATTERN (insn)) != USE)
568+ && !single_set (insn) && GET_CODE (PATTERN (insn)) != USE
569+ && !reg_mentioned_p (count_reg, insn))
570 || (FIND_REG_INC_NOTE (insn, NULL_RTX) != 0)
571 || (INSN_P (insn) && (set = single_set (insn))
572 && GET_CODE (SET_DEST (set)) == SUBREG))
573@@ -1048,14 +1104,16 @@
574
575 continue;
576 }
577-
578- if (! (g = create_ddg (bb, 0)))
579+ mark_doloop_insns (doloop_insns, tail);
580+ if (! (g = create_ddg (bb, doloop_insns)))
581 {
582 if (dump_file)
583 fprintf (dump_file, "SMS create_ddg failed\n");
584 continue;
585 }
586-
587+ if (dump_file)
588+ fprintf (dump_file, "SMS closing_branch_deps: %d\n",
589+ g->closing_branch_deps);
590 g_arr[loop->num] = g;
591 if (dump_file)
592 fprintf (dump_file, "...OK\n");
593@@ -1157,11 +1215,13 @@
594
595 ps = sms_schedule_by_order (g, mii, maxii, node_order);
596
597- if (ps){
598- stage_count = PS_STAGE_COUNT (ps);
599- gcc_assert(stage_count >= 1);
600- }
601-
602+ if (ps)
603+ {
604+ stage_count = calculate_stage_count (ps);
605+ gcc_assert(stage_count >= 1);
606+ PS_STAGE_COUNT(ps) = stage_count;
607+ }
608+
609 /* Stage count of 1 means that there is no interleaving between
610 iterations, let the scheduling passes do the job. */
611 if (stage_count <= 1
612@@ -1182,17 +1242,7 @@
613 else
614 {
615 struct undo_replace_buff_elem *reg_move_replaces;
616-
617- if (dump_file)
618- {
619- fprintf (dump_file,
620- "SMS succeeded %d %d (with ii, sc)\n", ps->ii,
621- stage_count);
622- print_partial_schedule (ps, dump_file);
623- fprintf (dump_file,
624- "SMS Branch (%d) will later be scheduled at cycle %d.\n",
625- g->closing_branch->cuid, PS_MIN_CYCLE (ps) - 1);
626- }
627+ int amount;
628
629 /* Set the stage boundaries. If the DDG is built with closing_branch_deps,
630 the closing_branch was scheduled and should appear in the last (ii-1)
631@@ -1202,12 +1252,28 @@
632 TODO: Revisit the issue of scheduling the insns of the
633 control part relative to the branch when the control part
634 has more than one insn. */
635- normalize_sched_times (ps);
636- rotate_partial_schedule (ps, PS_MIN_CYCLE (ps));
637+ amount = (g->closing_branch_deps)? SCHED_TIME (g->closing_branch) + 1:
638+ PS_MIN_CYCLE (ps);
639+ reset_sched_times (ps, amount);
640+ rotate_partial_schedule (ps, amount);
641+
642 set_columns_for_ps (ps);
643
644 canon_loop (loop);
645
646+ if (dump_file)
647+ {
648+ fprintf (dump_file,
649+ "SMS succeeded %d %d (with ii, sc)\n", ps->ii,
650+ stage_count);
651+ print_partial_schedule (ps, dump_file);
652+ if (!g->closing_branch_deps)
653+ fprintf (dump_file,
654+ "SMS Branch (%d) will later be scheduled at \
655+ cycle %d.\n",
656+ g->closing_branch->cuid, PS_MIN_CYCLE (ps) - 1);
657+ }
658+
659 /* case the BCT count is not known , Do loop-versioning */
660 if (count_reg && ! count_init)
661 {
662@@ -1252,6 +1318,7 @@
663 }
664
665 free (g_arr);
666+ sbitmap_free (doloop_insns);
667
668 /* Release scheduler data, needed until now because of DFA. */
669 haifa_sched_finish ();
670@@ -1759,8 +1826,9 @@
671 RESET_BIT (tobe_scheduled, u);
672 continue;
673 }
674-
675- if (JUMP_P (insn)) /* Closing branch handled later. */
676+ /* Closing branch handled later unless closing_branch_deps
677+ is true. */
678+ if (JUMP_P (insn) && !g->closing_branch_deps)
679 {
680 RESET_BIT (tobe_scheduled, u);
681 continue;
682@@ -1893,8 +1961,8 @@
683 if (dump_file)
684 fprintf (dump_file, "split_row=%d\n", split_row);
685
686- normalize_sched_times (ps);
687- rotate_partial_schedule (ps, ps->min_cycle);
688+ reset_sched_times (ps, PS_MIN_CYCLE (ps));
689+ rotate_partial_schedule (ps, PS_MIN_CYCLE (ps));
690
691 rows_new = (ps_insn_ptr *) xcalloc (new_ii, sizeof (ps_insn_ptr));
692 for (row = 0; row < split_row; row++)
693@@ -2571,6 +2639,7 @@
694 ps_insn_ptr next_ps_i;
695 ps_insn_ptr first_must_follow = NULL;
696 ps_insn_ptr last_must_precede = NULL;
697+ ps_insn_ptr last_in_row = NULL;
698 int row;
699
700 if (! ps_i)
701@@ -2597,8 +2666,37 @@
702 else
703 last_must_precede = next_ps_i;
704 }
705+ /* The closing branch must be the last in the row. */
706+ if (must_precede
707+ && TEST_BIT (must_precede, next_ps_i->node->cuid)
708+ && JUMP_P (next_ps_i->node->insn))
709+ return false;
710+
711+ last_in_row = next_ps_i;
712 }
713
714+ /* If closing_branch_deps is true we are scheduling the closing
715+ branch as well. Make sure there is no dependent instruction after
716+ it as the branch should be the last instruction. */
717+ if (JUMP_P (ps_i->node->insn))
718+ {
719+ if (first_must_follow)
720+ return false;
721+ if (last_in_row)
722+ {
723+ /* Make the branch the last in the row. New instructions
724+ will be inserted at the beginning of the row or after the
725+ last must_precede instruction thus the branch is guaranteed
726+ to remain the last instruction in the row. */
727+ last_in_row->next_in_row = ps_i;
728+ ps_i->prev_in_row = last_in_row;
729+ ps_i->next_in_row = NULL;
730+ }
731+ else
732+ ps->rows[row] = ps_i;
733+ return true;
734+ }
735+
736 /* Now insert the node after INSERT_AFTER_PSI. */
737
738 if (! last_must_precede)
739@@ -2820,6 +2918,54 @@
740 return ps_i;
741 }
742
743+/* Calculate the stage count of the partial schedule PS. */
744+int
745+calculate_stage_count (partial_schedule_ptr ps)
746+{
747+ int stage_count;
748+
749+ /* If closing_branch_deps is false then the stage
750+ boundaries are placed efficiently, meaning that min_cycle will be
751+ placed at row 0. Otherwise, the closing branch will be placed in
752+ row ii-1. For the later case we assume the final SMSed kernel can
753+ be divided into two intervals. This assumption is used for the
754+ stage count calculation:
755+
756+ 1) min_cycle <= interval 0 < first_cycle_in_row_zero
757+ 2) first_cycle_in_row_zero <= interval 1 < max_cycle
758+ */
759+ stage_count =
760+ CALC_STAGE_COUNT (PS_MIN_CYCLE (ps), PS_MAX_CYCLE (ps), ps->ii);
761+ if (ps->g->closing_branch_deps)
762+ {
763+ int new_min_cycle;
764+ int new_min_cycle_row;
765+ int rotation_amount = SCHED_TIME (ps->g->closing_branch) + 1;
766+
767+ /* This is the new value of min_cycle after the final rotation to
768+ bring closing branch into row ii-1. */
769+ new_min_cycle = PS_MIN_CYCLE (ps) - rotation_amount;
770+ /* This is the row which the the new min_cycle will be placed in. */
771+ new_min_cycle_row = SMODULO (new_min_cycle, ps->ii);
772+ /* If the row of min_cycle is zero then interval 0 is empty.
773+ Otherwise, we need to calculate interval 1 and add it by one
774+ to take interval 0 into account. */
775+ if (new_min_cycle_row != 0)
776+ {
777+ int new_max_cycle, first_cycle_in_row_zero;
778+
779+ new_max_cycle = PS_MAX_CYCLE (ps) - rotation_amount;
780+ first_cycle_in_row_zero =
781+ new_min_cycle + ps->ii - new_min_cycle_row;
782+
783+ stage_count =
784+ CALC_STAGE_COUNT (first_cycle_in_row_zero, new_max_cycle,
785+ ps->ii) + 1;
786+ }
787+ }
788+ return stage_count;
789+}
790+
791 /* Rotate the rows of PS such that insns scheduled at time
792 START_CYCLE will appear in row 0. Updates max/min_cycles. */
793 void

Subscribers

People subscribed via source and target branches