Merge lp:~irar/gcc-linaro/vect_widen_mult_4.6 into lp:gcc-linaro/4.6

Proposed by Ira Rosen
Status: Superseded
Proposed branch: lp:~irar/gcc-linaro/vect_widen_mult_4.6
Merge into: lp:gcc-linaro/4.6
Diff against target: 1361 lines (+656/-194)
12 files modified
ChangeLog.linaro (+77/-0)
gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c (+60/-0)
gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-u16.c (+77/-0)
gcc/testsuite/gcc.dg/vect/vect-widen-mult-u16.c (+4/-6)
gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8.c (+4/-3)
gcc/testsuite/lib/target-supports.exp (+50/-2)
gcc/tree-vect-loop-manip.c (+0/-36)
gcc/tree-vect-loop.c (+63/-25)
gcc/tree-vect-patterns.c (+187/-94)
gcc/tree-vect-slp.c (+2/-0)
gcc/tree-vect-stmts.c (+131/-27)
gcc/tree-vectorizer.h (+1/-1)
To merge this branch: bzr merge lp:~irar/gcc-linaro/vect_widen_mult_4.6
Reviewer Review Type Date Requested Status
Linaro Toolchain Developers Pending
Review via email: mp+65443@code.launchpad.net

This proposal supersedes a proposal from 2011-06-19.

This proposal has been superseded by a proposal from 2011-06-22.

Description of the change

Improve vectorization of widen-mult: support unsigned operands and multiplication by a constant. Also fix an old bug that comes into effect with the support of constants.

This is a backport from FSF.

Also includes a fix to match 4.6 and another bug fix.

To post a comment you must log in.
Revision history for this message
Linaro Toolchain Builder (cbuild) wrote : Posted in a previous version of this proposal

cbuild has taken a snapshot of this branch at r106759 and queued it for build.

The snapshot is available at:
 http://ex.seabright.co.nz/snapshots/gcc-linaro-4.6+bzr106759~irar~vect_widen_mult_4.6.tar.xdelta3.xz

and will be built on the following builders:
 a9-builder armv5-builder i686 x86_64

You can track the build queue at:
 http://ex.seabright.co.nz/helpers/scheduler

cbuild-snapshot: gcc-linaro-4.6+bzr106759~irar~vect_widen_mult_4.6
cbuild-ancestor: lp:gcc-linaro/4.6+bzr106758
cbuild-state: check

Revision history for this message
Linaro Toolchain Builder (cbuild) wrote : Posted in a previous version of this proposal
Download full text (4.0 KiB)

cbuild successfully built this on i686-lucid-cbuild132-scorpius-i686r1.

The build results are available at:
 http://ex.seabright.co.nz/build/gcc-linaro-4.6+bzr106759~irar~vect_widen_mult_4.6/logs/i686-lucid-cbuild132-scorpius-i686r1

The test suite results changed compared to the branch point lp:gcc-linaro/4.6+bzr106758:
 -PASS: gcc.dg/vect/vect-multitypes-7.c -flto scan-tree-dump-times vect "vectorized 1 loops" 1
 +FAIL: gcc.dg/vect/vect-multitypes-7.c -flto scan-tree-dump-times vect "vectorized 1 loops" 1
 -PASS: gcc.dg/vect/vect-multitypes-7.c scan-tree-dump-times vect "vectorized 1 loops" 1
 +FAIL: gcc.dg/vect/vect-multitypes-7.c scan-tree-dump-times vect "vectorized 1 loops" 1
 -PASS: gcc.dg/vect/vect-reduc-dot-s16a.c -flto scan-tree-dump-times vect "vectorized 1 loops" 1
 -PASS: gcc.dg/vect/vect-reduc-dot-s16a.c -flto scan-tree-dump-times vect "vectorized 1 loops" 1
 +FAIL: gcc.dg/vect/vect-reduc-dot-s16a.c -flto scan-tree-dump-times vect "vectorized 1 loops" 1
 +FAIL: gcc.dg/vect/vect-reduc-dot-s16a.c -flto scan-tree-dump-times vect "vectorized 1 loops" 1
 -PASS: gcc.dg/vect/vect-reduc-dot-s16a.c scan-tree-dump-times vect "vectorized 1 loops" 1
 -PASS: gcc.dg/vect/vect-reduc-dot-s16a.c scan-tree-dump-times vect "vectorized 1 loops" 1
 +FAIL: gcc.dg/vect/vect-reduc-dot-s16a.c scan-tree-dump-times vect "vectorized 1 loops" 1
 +FAIL: gcc.dg/vect/vect-reduc-dot-s16a.c scan-tree-dump-times vect "vectorized 1 loops" 1
 +PASS: gcc.dg/vect/vect-widen-mult-const-s16.c execution test
 +PASS: gcc.dg/vect/vect-widen-mult-const-s16.c -flto execution test
 +PASS: gcc.dg/vect/vect-widen-mult-const-s16.c -flto scan-tree-dump-times vect "pattern recognized" 2
 +PASS: gcc.dg/vect/vect-widen-mult-const-s16.c -flto scan-tree-dump-times vect "vectorized 1 loops" 2
 +PASS: gcc.dg/vect/vect-widen-mult-const-s16.c -flto scan-tree-dump-times vect "vect_recog_widen_mult_pattern: detected" 2
 +PASS: gcc.dg/vect/vect-widen-mult-const-s16.c -flto (test for excess errors)
 +PASS: gcc.dg/vect/vect-widen-mult-const-s16.c scan-tree-dump-times vect "pattern recognized" 2
 +PASS: gcc.dg/vect/vect-widen-mult-const-s16.c scan-tree-dump-times vect "vectorized 1 loops" 2
 +PASS: gcc.dg/vect/vect-widen-mult-const-s16.c scan-tree-dump-times vect "vect_recog_widen_mult_pattern: detected" 2
 +PASS: gcc.dg/vect/vect-widen-mult-const-s16.c (test for excess errors)
 +PASS: gcc.dg/vect/vect-widen-mult-const-u16.c execution test
 +PASS: gcc.dg/vect/vect-widen-mult-const-u16.c -flto execution test
 +PASS: gcc.dg/vect/vect-widen-mult-const-u16.c -flto scan-tree-dump-times vect "pattern recognized" 2
 +PASS: gcc.dg/vect/vect-widen-mult-const-u16.c -flto scan-tree-dump-times vect "vectorized 1 loops" 3
 +PASS: gcc.dg/vect/vect-widen-mult-const-u16.c -flto scan-tree-dump-times vect "vect_recog_widen_mult_pattern: detected" 2
 +PASS: gcc.dg/vect/vect-widen-mult-const-u16.c -flto (test for excess errors)
 +PASS: gcc.dg/vect/vect-widen-mult-const-u16.c scan-tree-dump-times vect "pattern recognized" 2
 +PASS: gcc.dg/vect/vect-widen-mult-const-u16.c scan-tree-dump-times vect "vectorized 1 loops" 3
 +PASS: gcc.dg/vect/vect-widen-mult-const-u16.c scan-tree-dump-times vect "vect_recog_widen_mu...

Read more...

Revision history for this message
Linaro Toolchain Builder (cbuild) wrote : Posted in a previous version of this proposal
Download full text (4.1 KiB)

cbuild successfully built this on x86_64-maverick-cbuild132-crucis-x86_64r1.

The build results are available at:
 http://ex.seabright.co.nz/build/gcc-linaro-4.6+bzr106759~irar~vect_widen_mult_4.6/logs/x86_64-maverick-cbuild132-crucis-x86_64r1

The test suite results changed compared to the branch point lp:gcc-linaro/4.6+bzr106758:
 -PASS: gcc.dg/vect/vect-multitypes-7.c -flto scan-tree-dump-times vect "vectorized 1 loops" 1
 +FAIL: gcc.dg/vect/vect-multitypes-7.c -flto scan-tree-dump-times vect "vectorized 1 loops" 1
 -PASS: gcc.dg/vect/vect-multitypes-7.c scan-tree-dump-times vect "vectorized 1 loops" 1
 +FAIL: gcc.dg/vect/vect-multitypes-7.c scan-tree-dump-times vect "vectorized 1 loops" 1
 -PASS: gcc.dg/vect/vect-reduc-dot-s16a.c -flto scan-tree-dump-times vect "vectorized 1 loops" 1
 -PASS: gcc.dg/vect/vect-reduc-dot-s16a.c -flto scan-tree-dump-times vect "vectorized 1 loops" 1
 +FAIL: gcc.dg/vect/vect-reduc-dot-s16a.c -flto scan-tree-dump-times vect "vectorized 1 loops" 1
 +FAIL: gcc.dg/vect/vect-reduc-dot-s16a.c -flto scan-tree-dump-times vect "vectorized 1 loops" 1
 -PASS: gcc.dg/vect/vect-reduc-dot-s16a.c scan-tree-dump-times vect "vectorized 1 loops" 1
 -PASS: gcc.dg/vect/vect-reduc-dot-s16a.c scan-tree-dump-times vect "vectorized 1 loops" 1
 +FAIL: gcc.dg/vect/vect-reduc-dot-s16a.c scan-tree-dump-times vect "vectorized 1 loops" 1
 +FAIL: gcc.dg/vect/vect-reduc-dot-s16a.c scan-tree-dump-times vect "vectorized 1 loops" 1
 +PASS: gcc.dg/vect/vect-widen-mult-const-s16.c execution test
 +PASS: gcc.dg/vect/vect-widen-mult-const-s16.c -flto execution test
 +PASS: gcc.dg/vect/vect-widen-mult-const-s16.c -flto scan-tree-dump-times vect "pattern recognized" 2
 +PASS: gcc.dg/vect/vect-widen-mult-const-s16.c -flto scan-tree-dump-times vect "vectorized 1 loops" 2
 +PASS: gcc.dg/vect/vect-widen-mult-const-s16.c -flto scan-tree-dump-times vect "vect_recog_widen_mult_pattern: detected" 2
 +PASS: gcc.dg/vect/vect-widen-mult-const-s16.c -flto (test for excess errors)
 +PASS: gcc.dg/vect/vect-widen-mult-const-s16.c scan-tree-dump-times vect "pattern recognized" 2
 +PASS: gcc.dg/vect/vect-widen-mult-const-s16.c scan-tree-dump-times vect "vectorized 1 loops" 2
 +PASS: gcc.dg/vect/vect-widen-mult-const-s16.c scan-tree-dump-times vect "vect_recog_widen_mult_pattern: detected" 2
 +PASS: gcc.dg/vect/vect-widen-mult-const-s16.c (test for excess errors)
 +PASS: gcc.dg/vect/vect-widen-mult-const-u16.c execution test
 +PASS: gcc.dg/vect/vect-widen-mult-const-u16.c -flto execution test
 +PASS: gcc.dg/vect/vect-widen-mult-const-u16.c -flto scan-tree-dump-times vect "pattern recognized" 2
 +PASS: gcc.dg/vect/vect-widen-mult-const-u16.c -flto scan-tree-dump-times vect "vectorized 1 loops" 3
 +PASS: gcc.dg/vect/vect-widen-mult-const-u16.c -flto scan-tree-dump-times vect "vect_recog_widen_mult_pattern: detected" 2
 +PASS: gcc.dg/vect/vect-widen-mult-const-u16.c -flto (test for excess errors)
 +PASS: gcc.dg/vect/vect-widen-mult-const-u16.c scan-tree-dump-times vect "pattern recognized" 2
 +PASS: gcc.dg/vect/vect-widen-mult-const-u16.c scan-tree-dump-times vect "vectorized 1 loops" 3
 +PASS: gcc.dg/vect/vect-widen-mult-const-u16.c scan-tree-dump-times vect "vect_reco...

Read more...

Revision history for this message
Ira Rosen (irar) wrote : Posted in a previous version of this proposal

The failures are because of the differences in gimple between 4.6 and 4.7.
I'll fix it and resubmit.

Revision history for this message
Ira Rosen (irar) wrote : Posted in a previous version of this proposal

OK, I have a fix, but I don't know what to do now. Should I delete this proposal and delete the branch (how?), or is there a nice way to resubmit with changes?

Thanks,
Ira

Revision history for this message
Michael Hope (michaelh1) wrote : Posted in a previous version of this proposal

Hi Ira. It's not bad - I did this myself recently. Here's what you do.

 * Make the fixes in your lp:~irar/gcc-linaro/vect_widen_mult_4.6 branch
 * Commit and push
 * Click on 'Resubmit proposal' at the top right of the merge request page

See https://code.launchpad.net/~michaelh1/gcc-linaro/lp723185/+merge/63965 for one of mine where I did this.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'ChangeLog.linaro'
2--- ChangeLog.linaro 2011-06-20 11:32:06 +0000
3+++ ChangeLog.linaro 2011-06-22 12:20:57 +0000
4@@ -1,3 +1,80 @@
5+2011-06-22 Ira Rosen <ira.rosen@linaro.org>
6+
7+ Backport from FSF:
8+
9+ 2011-06-07 Ira Rosen <ira.rosen@linaro.org>
10+
11+ gcc/
12+ * tree-vectorizer.h (vect_recog_func_ptr): Make last argument to be
13+ a pointer.
14+ * tree-vect-patterns.c (vect_recog_widen_sum_pattern,
15+ vect_recog_widen_mult_pattern, vect_recog_dot_prod_pattern,
16+ vect_recog_pow_pattern): Likewise.
17+ (vect_pattern_recog_1): Remove declaration.
18+ (widened_name_p): Remove declaration. Add new argument to specify
19+ whether to check that both types are either signed or unsigned.
20+ (vect_recog_widen_mult_pattern): Update documentation. Handle
21+ unsigned patterns and multiplication by constants.
22+ (vect_pattern_recog_1): Update vect_recog_func references. Use
23+ statement information from the statement returned from pattern
24+ detection functions.
25+ (vect_pattern_recog): Update vect_recog_func reference.
26+ * tree-vect-stmts.c (vectorizable_type_promotion): For widening
27+ multiplication by a constant use the type of the other operand.
28+
29+ gcc/testsuite
30+ * lib/target-supports.exp
31+ (check_effective_target_vect_widen_mult_qi_to_hi):
32+ Add NEON as supporting target.
33+ (check_effective_target_vect_widen_mult_hi_to_si): Likewise.
34+ (check_effective_target_vect_widen_mult_qi_to_hi_pattern): New.
35+ (check_effective_target_vect_widen_mult_hi_to_si_pattern): New.
36+ * gcc.dg/vect/vect-widen-mult-u8.c: Expect to be vectorized
37+ using widening multiplication on targets that support it.
38+ * gcc.dg/vect/vect-widen-mult-u16.c: Likewise.
39+ * gcc.dg/vect/vect-widen-mult-const-s16.c: New test.
40+ * gcc.dg/vect/vect-widen-mult-const-u16.c: New test.
41+
42+ and
43+
44+ 2011-06-15 Ira Rosen <ira.rosen@linaro.org>
45+
46+ gcc/
47+ * tree-vect-loop-manip.c (remove_dead_stmts_from_loop): Remove.
48+ (slpeel_tree_peel_loop_to_edge): Don't call
49+ remove_dead_stmts_from_loop.
50+ * tree-vect-loop.c (vect_determine_vectorization_factor): Don't
51+ remove irrelevant pattern statements. For irrelevant statements
52+ check if it is the last statement of a detected pattern, use
53+ corresponding pattern statement instead.
54+ (destroy_loop_vec_info): No need to remove pattern statements,
55+ only free stmt_vec_info.
56+ (vect_transform_loop): For irrelevant statements check if it is
57+ the last statement of a detected pattern, use corresponding
58+ pattern statement instead.
59+ * tree-vect-patterns.c (vect_pattern_recog_1): Don't insert
60+ pattern statements. Set basic block for the new statement.
61+ (vect_pattern_recog): Update documentation.
62+ * tree-vect-stmts.c (vect_mark_stmts_to_be_vectorized): Scan
63+ operands of pattern statements.
64+ (vectorizable_call): Fix printing. In case of a pattern statement
65+ use the lhs of the original statement when creating a dummy
66+ statement to replace the original call.
67+ (vect_analyze_stmt): For irrelevant statements check if it is
68+ the last statement of a detected pattern, use corresponding
69+ pattern statement instead.
70+ * tree-vect-slp.c (vect_schedule_slp_instance): For pattern
71+ statements use gsi of the original statement.
72+
73+ and
74+ 2011-06-21 Ira Rosen <ira.rosen@linaro.org>
75+
76+ PR tree-optimization/49478
77+ gcc/
78+
79+ * tree-vect-loop.c (vectorizable_reduction): Handle DOT_PROD_EXPR
80+ with constant operand.
81+
82 2011-06-20 Ramana Radhakrishnan <ramana.radhakrishnan@linaro.org>
83
84 gcc/
85
86=== added file 'gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c'
87--- gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c 1970-01-01 00:00:00 +0000
88+++ gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c 2011-06-22 12:20:57 +0000
89@@ -0,0 +1,60 @@
90+/* { dg-require-effective-target vect_int } */
91+
92+#include "tree-vect.h"
93+#include <stdlib.h>
94+
95+#define N 32
96+
97+__attribute__ ((noinline)) void
98+foo (int *__restrict a,
99+ short *__restrict b,
100+ int n)
101+{
102+ int i;
103+
104+ for (i = 0; i < n; i++)
105+ a[i] = b[i] * 2333;
106+
107+ for (i = 0; i < n; i++)
108+ if (a[i] != b[i] * 2333)
109+ abort ();
110+}
111+
112+__attribute__ ((noinline)) void
113+bar (int *__restrict a,
114+ short *__restrict b,
115+ int n)
116+{
117+ int i;
118+
119+ for (i = 0; i < n; i++)
120+ a[i] = b[i] * (short) 2333;
121+
122+ for (i = 0; i < n; i++)
123+ if (a[i] != b[i] * (short) 2333)
124+ abort ();
125+}
126+
127+int main (void)
128+{
129+ int i;
130+ int a[N];
131+ short b[N];
132+
133+ for (i = 0; i < N; i++)
134+ {
135+ a[i] = 0;
136+ b[i] = i;
137+ __asm__ volatile ("");
138+ }
139+
140+ foo (a, b, N);
141+ bar (a, b, N);
142+ return 0;
143+}
144+
145+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target vect_widen_mult_hi_to_si } } } */
146+/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 2 "vect" { target vect_widen_mult_hi_to_si_pattern } } } */
147+/* { dg-final { scan-tree-dump-times "pattern recognized" 2 "vect" { target vect_widen_mult_hi_to_si_pattern } } } */
148+/* { dg-final { cleanup-tree-dump "vect" } } */
149+
150
151=== added file 'gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-u16.c'
152--- gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-u16.c 1970-01-01 00:00:00 +0000
153+++ gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-u16.c 2011-06-22 12:20:57 +0000
154@@ -0,0 +1,77 @@
155+/* { dg-require-effective-target vect_int } */
156+
157+#include "tree-vect.h"
158+#include <stdlib.h>
159+
160+#define N 32
161+
162+__attribute__ ((noinline)) void
163+foo (unsigned int *__restrict a,
164+ unsigned short *__restrict b,
165+ int n)
166+{
167+ int i;
168+
169+ for (i = 0; i < n; i++)
170+ a[i] = b[i] * 2333;
171+
172+ for (i = 0; i < n; i++)
173+ if (a[i] != b[i] * 2333)
174+ abort ();
175+}
176+
177+__attribute__ ((noinline)) void
178+bar (unsigned int *__restrict a,
179+ unsigned short *__restrict b,
180+ int n)
181+{
182+ int i;
183+
184+ for (i = 0; i < n; i++)
185+ a[i] = (unsigned short) 2333 * b[i];
186+
187+ for (i = 0; i < n; i++)
188+ if (a[i] != b[i] * (unsigned short) 2333)
189+ abort ();
190+}
191+
192+__attribute__ ((noinline)) void
193+baz (unsigned int *__restrict a,
194+ unsigned short *__restrict b,
195+ int n)
196+{
197+ int i;
198+
199+ for (i = 0; i < n; i++)
200+ a[i] = b[i] * 233333333;
201+
202+ for (i = 0; i < n; i++)
203+ if (a[i] != b[i] * 233333333)
204+ abort ();
205+}
206+
207+
208+int main (void)
209+{
210+ int i;
211+ unsigned int a[N];
212+ unsigned short b[N];
213+
214+ for (i = 0; i < N; i++)
215+ {
216+ a[i] = 0;
217+ b[i] = i;
218+ __asm__ volatile ("");
219+ }
220+
221+ foo (a, b, N);
222+ bar (a, b, N);
223+ baz (a, b, N);
224+ return 0;
225+}
226+
227+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 3 "vect" { target vect_widen_mult_hi_to_si } } } */
228+/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 2 "vect" { target vect_widen_mult_hi_to_si_pattern } } } */
229+/* { dg-final { scan-tree-dump-times "pattern recognized" 2 "vect" { target vect_widen_mult_hi_to_si_pattern } } } */
230+/* { dg-final { cleanup-tree-dump "vect" } } */
231+
232
233=== modified file 'gcc/testsuite/gcc.dg/vect/vect-widen-mult-u16.c'
234--- gcc/testsuite/gcc.dg/vect/vect-widen-mult-u16.c 2010-05-27 12:23:45 +0000
235+++ gcc/testsuite/gcc.dg/vect/vect-widen-mult-u16.c 2011-06-22 12:20:57 +0000
236@@ -9,13 +9,11 @@
237 unsigned short Y[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__)));
238 unsigned int result[N];
239
240-/* short->int widening-mult */
241+/* unsigned short->unsigned int widening-mult. */
242 __attribute__ ((noinline)) int
243 foo1(int len) {
244 int i;
245
246- /* Not vectorized because X[i] and Y[i] are casted to 'int'
247- so the widening multiplication pattern is not recognized. */
248 for (i=0; i<len; i++) {
249 result[i] = (unsigned int)(X[i] * Y[i]);
250 }
251@@ -43,8 +41,8 @@
252 return 0;
253 }
254
255-/*The induction loop is vectorized */
256-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { xfail *-*-* } } } */
257-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_pack_trunc } } } */
258+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_widen_mult_hi_to_si || vect_unpack } } } } */
259+/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 1 "vect" { target vect_widen_mult_hi_to_si_pattern } } } */
260+/* { dg-final { scan-tree-dump-times "pattern recognized" 1 "vect" { target vect_widen_mult_hi_to_si_pattern } } } */
261 /* { dg-final { cleanup-tree-dump "vect" } } */
262
263
264=== modified file 'gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8.c'
265--- gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8.c 2009-05-08 12:39:01 +0000
266+++ gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8.c 2011-06-22 12:20:57 +0000
267@@ -9,7 +9,7 @@
268 unsigned char Y[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__)));
269 unsigned short result[N];
270
271-/* char->short widening-mult */
272+/* unsigned char-> unsigned short widening-mult. */
273 __attribute__ ((noinline)) int
274 foo1(int len) {
275 int i;
276@@ -28,8 +28,7 @@
277 for (i=0; i<N; i++) {
278 X[i] = i;
279 Y[i] = 64-i;
280- if (i%4 == 0)
281- X[i] = 5;
282+ __asm__ volatile ("");
283 }
284
285 foo1 (N);
286@@ -43,5 +42,7 @@
287 }
288
289 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_widen_mult_qi_to_hi || vect_unpack } } } } */
290+/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 1 "vect" { target vect_widen_mult_qi_to_hi_pattern } } } */
291+/* { dg-final { scan-tree-dump-times "pattern recognized" 1 "vect" { target vect_widen_mult_qi_to_hi_pattern } } } */
292 /* { dg-final { cleanup-tree-dump "vect" } } */
293
294
295=== modified file 'gcc/testsuite/lib/target-supports.exp'
296--- gcc/testsuite/lib/target-supports.exp 2011-06-02 12:12:00 +0000
297+++ gcc/testsuite/lib/target-supports.exp 2011-06-22 12:20:57 +0000
298@@ -2663,7 +2663,8 @@
299 } else {
300 set et_vect_widen_mult_qi_to_hi_saved 0
301 }
302- if { [istarget powerpc*-*-*] } {
303+ if { [istarget powerpc*-*-*]
304+ || ([istarget arm*-*-*] && [check_effective_target_arm_neon]) } {
305 set et_vect_widen_mult_qi_to_hi_saved 1
306 }
307 }
308@@ -2696,7 +2697,8 @@
309 || [istarget spu-*-*]
310 || [istarget ia64-*-*]
311 || [istarget i?86-*-*]
312- || [istarget x86_64-*-*] } {
313+ || [istarget x86_64-*-*]
314+ || ([istarget arm*-*-*] && [check_effective_target_arm_neon]) } {
315 set et_vect_widen_mult_hi_to_si_saved 1
316 }
317 }
318@@ -2705,6 +2707,52 @@
319 }
320
321 # Return 1 if the target plus current options supports a vector
322+# widening multiplication of *char* args into *short* result, 0 otherwise.
323+#
324+# This won't change for different subtargets so cache the result.
325+
326+proc check_effective_target_vect_widen_mult_qi_to_hi_pattern { } {
327+ global et_vect_widen_mult_qi_to_hi_pattern
328+
329+ if [info exists et_vect_widen_mult_qi_to_hi_pattern_saved] {
330+ verbose "check_effective_target_vect_widen_mult_qi_to_hi_pattern: using cached result" 2
331+ } else {
332+ set et_vect_widen_mult_qi_to_hi_pattern_saved 0
333+ if { [istarget powerpc*-*-*]
334+ || ([istarget arm*-*-*] && [check_effective_target_arm_neon]) } {
335+ set et_vect_widen_mult_qi_to_hi_pattern_saved 1
336+ }
337+ }
338+ verbose "check_effective_target_vect_widen_mult_qi_to_hi_pattern: returning $et_vect_widen_mult_qi_to_hi_pattern_saved" 2
339+ return $et_vect_widen_mult_qi_to_hi_pattern_saved
340+}
341+
342+# Return 1 if the target plus current options supports a vector
343+# widening multiplication of *short* args into *int* result, 0 otherwise.
344+#
345+# This won't change for different subtargets so cache the result.
346+
347+proc check_effective_target_vect_widen_mult_hi_to_si_pattern { } {
348+ global et_vect_widen_mult_hi_to_si_pattern
349+
350+ if [info exists et_vect_widen_mult_hi_to_si_pattern_saved] {
351+ verbose "check_effective_target_vect_widen_mult_hi_to_si_pattern: using cached result" 2
352+ } else {
353+ set et_vect_widen_mult_hi_to_si_pattern_saved 0
354+ if { [istarget powerpc*-*-*]
355+ || [istarget spu-*-*]
356+ || [istarget ia64-*-*]
357+ || [istarget i?86-*-*]
358+ || [istarget x86_64-*-*]
359+ || ([istarget arm*-*-*] && [check_effective_target_arm_neon]) } {
360+ set et_vect_widen_mult_hi_to_si_pattern_saved 1
361+ }
362+ }
363+ verbose "check_effective_target_vect_widen_mult_hi_to_si_pattern: returning $et_vect_widen_mult_hi_to_si_pattern_saved" 2
364+ return $et_vect_widen_mult_hi_to_si_pattern_saved
365+}
366+
367+# Return 1 if the target plus current options supports a vector
368 # dot-product of signed chars, 0 otherwise.
369 #
370 # This won't change for different subtargets so cache the result.
371
372=== modified file 'gcc/tree-vect-loop-manip.c'
373--- gcc/tree-vect-loop-manip.c 2011-05-18 13:24:05 +0000
374+++ gcc/tree-vect-loop-manip.c 2011-06-22 12:20:57 +0000
375@@ -1105,35 +1105,6 @@
376 first_niters = PHI_RESULT (newphi);
377 }
378
379-
380-/* Remove dead assignments from loop NEW_LOOP. */
381-
382-static void
383-remove_dead_stmts_from_loop (struct loop *new_loop)
384-{
385- basic_block *bbs = get_loop_body (new_loop);
386- unsigned i;
387- for (i = 0; i < new_loop->num_nodes; ++i)
388- {
389- gimple_stmt_iterator gsi;
390- for (gsi = gsi_start_bb (bbs[i]); !gsi_end_p (gsi);)
391- {
392- gimple stmt = gsi_stmt (gsi);
393- if (is_gimple_assign (stmt)
394- && TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME
395- && has_zero_uses (gimple_assign_lhs (stmt)))
396- {
397- gsi_remove (&gsi, true);
398- release_defs (stmt);
399- }
400- else
401- gsi_next (&gsi);
402- }
403- }
404- free (bbs);
405-}
406-
407-
408 /* Function slpeel_tree_peel_loop_to_edge.
409
410 Peel the first (last) iterations of LOOP into a new prolog (epilog) loop
411@@ -1445,13 +1416,6 @@
412 BITMAP_FREE (definitions);
413 delete_update_ssa ();
414
415- /* Remove all pattern statements from the loop copy. They will confuse
416- the expander if DCE is disabled.
417- ??? The pattern recognizer should be split into an analysis and
418- a transformation phase that is then run only on the loop that is
419- going to be transformed. */
420- remove_dead_stmts_from_loop (new_loop);
421-
422 adjust_vec_debug_stmts ();
423
424 return new_loop;
425
426=== modified file 'gcc/tree-vect-loop.c'
427--- gcc/tree-vect-loop.c 2011-03-01 13:18:25 +0000
428+++ gcc/tree-vect-loop.c 2011-06-22 12:20:57 +0000
429@@ -244,7 +244,7 @@
430 for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
431 {
432 tree vf_vectype;
433- gimple stmt = gsi_stmt (si);
434+ gimple stmt = gsi_stmt (si), pattern_stmt;
435 stmt_info = vinfo_for_stmt (stmt);
436
437 if (vect_print_dump_info (REPORT_DETAILS))
438@@ -259,9 +259,25 @@
439 if (!STMT_VINFO_RELEVANT_P (stmt_info)
440 && !STMT_VINFO_LIVE_P (stmt_info))
441 {
442- if (vect_print_dump_info (REPORT_DETAILS))
443- fprintf (vect_dump, "skip.");
444- continue;
445+ if (STMT_VINFO_IN_PATTERN_P (stmt_info)
446+ && (pattern_stmt = STMT_VINFO_RELATED_STMT (stmt_info))
447+ && (STMT_VINFO_RELEVANT_P (vinfo_for_stmt (pattern_stmt))
448+ || STMT_VINFO_LIVE_P (vinfo_for_stmt (pattern_stmt))))
449+ {
450+ stmt = pattern_stmt;
451+ stmt_info = vinfo_for_stmt (pattern_stmt);
452+ if (vect_print_dump_info (REPORT_DETAILS))
453+ {
454+ fprintf (vect_dump, "==> examining pattern statement: ");
455+ print_gimple_stmt (vect_dump, stmt, 0, TDF_SLIM);
456+ }
457+ }
458+ else
459+ {
460+ if (vect_print_dump_info (REPORT_DETAILS))
461+ fprintf (vect_dump, "skip.");
462+ continue;
463+ }
464 }
465
466 if (gimple_get_lhs (stmt) == NULL_TREE)
467@@ -816,25 +832,17 @@
468
469 if (stmt_info)
470 {
471- /* Check if this is a "pattern stmt" (introduced by the
472- vectorizer during the pattern recognition pass). */
473- bool remove_stmt_p = false;
474- gimple orig_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
475- if (orig_stmt)
476- {
477- stmt_vec_info orig_stmt_info = vinfo_for_stmt (orig_stmt);
478- if (orig_stmt_info
479- && STMT_VINFO_IN_PATTERN_P (orig_stmt_info))
480- remove_stmt_p = true;
481- }
482+ /* Check if this statement has a related "pattern stmt"
483+ (introduced by the vectorizer during the pattern recognition
484+ pass). Free pattern's stmt_vec_info. */
485+ if (STMT_VINFO_IN_PATTERN_P (stmt_info)
486+ && vinfo_for_stmt (STMT_VINFO_RELATED_STMT (stmt_info)))
487+ free_stmt_vec_info (STMT_VINFO_RELATED_STMT (stmt_info));
488
489 /* Free stmt_vec_info. */
490 free_stmt_vec_info (stmt);
491+ }
492
493- /* Remove dead "pattern stmts". */
494- if (remove_stmt_p)
495- gsi_remove (&si, true);
496- }
497 gsi_next (&si);
498 }
499 }
500@@ -4262,6 +4270,25 @@
501 return false;
502 }
503
504+ /* In case of widenning multiplication by a constant, we update the type
505+ of the constant to be the type of the other operand. We check that the
506+ constant fits the type in the pattern recognition pass. */
507+ if (code == DOT_PROD_EXPR
508+ && !types_compatible_p (TREE_TYPE (ops[0]), TREE_TYPE (ops[1])))
509+ {
510+ if (TREE_CODE (ops[0]) == INTEGER_CST)
511+ ops[0] = fold_convert (TREE_TYPE (ops[1]), ops[0]);
512+ else if (TREE_CODE (ops[1]) == INTEGER_CST)
513+ ops[1] = fold_convert (TREE_TYPE (ops[0]), ops[1]);
514+ else
515+ {
516+ if (vect_print_dump_info (REPORT_DETAILS))
517+ fprintf (vect_dump, "invalid types in dot-prod");
518+
519+ return false;
520+ }
521+ }
522+
523 if (!vec_stmt) /* transformation not required. */
524 {
525 STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type;
526@@ -4796,7 +4823,7 @@
527
528 for (si = gsi_start_bb (bb); !gsi_end_p (si);)
529 {
530- gimple stmt = gsi_stmt (si);
531+ gimple stmt = gsi_stmt (si), pattern_stmt;
532 bool is_store;
533
534 if (vect_print_dump_info (REPORT_DETAILS))
535@@ -4821,14 +4848,25 @@
536
537 if (!STMT_VINFO_RELEVANT_P (stmt_info)
538 && !STMT_VINFO_LIVE_P (stmt_info))
539- {
540- gsi_next (&si);
541- continue;
542+ {
543+ if (STMT_VINFO_IN_PATTERN_P (stmt_info)
544+ && (pattern_stmt = STMT_VINFO_RELATED_STMT (stmt_info))
545+ && (STMT_VINFO_RELEVANT_P (vinfo_for_stmt (pattern_stmt))
546+ || STMT_VINFO_LIVE_P (vinfo_for_stmt (pattern_stmt))))
547+ {
548+ stmt = pattern_stmt;
549+ stmt_info = vinfo_for_stmt (stmt);
550+ }
551+ else
552+ {
553+ gsi_next (&si);
554+ continue;
555+ }
556 }
557
558 gcc_assert (STMT_VINFO_VECTYPE (stmt_info));
559- nunits =
560- (unsigned int) TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info));
561+ nunits = (unsigned int) TYPE_VECTOR_SUBPARTS (
562+ STMT_VINFO_VECTYPE (stmt_info));
563 if (!STMT_SLP_TYPE (stmt_info)
564 && nunits != (unsigned int) vectorization_factor
565 && vect_print_dump_info (REPORT_DETAILS))
566
567=== modified file 'gcc/tree-vect-patterns.c'
568--- gcc/tree-vect-patterns.c 2010-12-02 11:47:12 +0000
569+++ gcc/tree-vect-patterns.c 2011-06-22 12:20:57 +0000
570@@ -38,16 +38,11 @@
571 #include "recog.h"
572 #include "diagnostic-core.h"
573
574-/* Function prototypes */
575-static void vect_pattern_recog_1
576- (gimple (* ) (gimple, tree *, tree *), gimple_stmt_iterator);
577-static bool widened_name_p (tree, gimple, tree *, gimple *);
578-
579 /* Pattern recognition functions */
580-static gimple vect_recog_widen_sum_pattern (gimple, tree *, tree *);
581-static gimple vect_recog_widen_mult_pattern (gimple, tree *, tree *);
582-static gimple vect_recog_dot_prod_pattern (gimple, tree *, tree *);
583-static gimple vect_recog_pow_pattern (gimple, tree *, tree *);
584+static gimple vect_recog_widen_sum_pattern (gimple *, tree *, tree *);
585+static gimple vect_recog_widen_mult_pattern (gimple *, tree *, tree *);
586+static gimple vect_recog_dot_prod_pattern (gimple *, tree *, tree *);
587+static gimple vect_recog_pow_pattern (gimple *, tree *, tree *);
588 static vect_recog_func_ptr vect_vect_recog_func_ptrs[NUM_PATTERNS] = {
589 vect_recog_widen_mult_pattern,
590 vect_recog_widen_sum_pattern,
591@@ -61,10 +56,12 @@
592 is a result of a type-promotion, such that:
593 DEF_STMT: NAME = NOP (name0)
594 where the type of name0 (HALF_TYPE) is smaller than the type of NAME.
595-*/
596+ If CHECK_SIGN is TRUE, check that either both types are signed or both are
597+ unsigned. */
598
599 static bool
600-widened_name_p (tree name, gimple use_stmt, tree *half_type, gimple *def_stmt)
601+widened_name_p (tree name, gimple use_stmt, tree *half_type, gimple *def_stmt,
602+ bool check_sign)
603 {
604 tree dummy;
605 gimple dummy_gimple;
606@@ -98,7 +95,7 @@
607
608 *half_type = TREE_TYPE (oprnd0);
609 if (!INTEGRAL_TYPE_P (type) || !INTEGRAL_TYPE_P (*half_type)
610- || (TYPE_UNSIGNED (type) != TYPE_UNSIGNED (*half_type))
611+ || ((TYPE_UNSIGNED (type) != TYPE_UNSIGNED (*half_type)) && check_sign)
612 || (TYPE_PRECISION (type) < (TYPE_PRECISION (*half_type) * 2)))
613 return false;
614
615@@ -168,12 +165,12 @@
616 inner-loop nested in an outer-loop that us being vectorized). */
617
618 static gimple
619-vect_recog_dot_prod_pattern (gimple last_stmt, tree *type_in, tree *type_out)
620+vect_recog_dot_prod_pattern (gimple *last_stmt, tree *type_in, tree *type_out)
621 {
622 gimple stmt;
623 tree oprnd0, oprnd1;
624 tree oprnd00, oprnd01;
625- stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt);
626+ stmt_vec_info stmt_vinfo = vinfo_for_stmt (*last_stmt);
627 tree type, half_type;
628 gimple pattern_stmt;
629 tree prod_type;
630@@ -181,10 +178,10 @@
631 struct loop *loop = LOOP_VINFO_LOOP (loop_info);
632 tree var, rhs;
633
634- if (!is_gimple_assign (last_stmt))
635+ if (!is_gimple_assign (*last_stmt))
636 return NULL;
637
638- type = gimple_expr_type (last_stmt);
639+ type = gimple_expr_type (*last_stmt);
640
641 /* Look for the following pattern
642 DX = (TYPE1) X;
643@@ -210,7 +207,7 @@
644 /* Starting from LAST_STMT, follow the defs of its uses in search
645 of the above pattern. */
646
647- if (gimple_assign_rhs_code (last_stmt) != PLUS_EXPR)
648+ if (gimple_assign_rhs_code (*last_stmt) != PLUS_EXPR)
649 return NULL;
650
651 if (STMT_VINFO_IN_PATTERN_P (stmt_vinfo))
652@@ -231,14 +228,14 @@
653
654 if (STMT_VINFO_DEF_TYPE (stmt_vinfo) != vect_reduction_def)
655 return NULL;
656- oprnd0 = gimple_assign_rhs1 (last_stmt);
657- oprnd1 = gimple_assign_rhs2 (last_stmt);
658+ oprnd0 = gimple_assign_rhs1 (*last_stmt);
659+ oprnd1 = gimple_assign_rhs2 (*last_stmt);
660 if (!types_compatible_p (TREE_TYPE (oprnd0), type)
661 || !types_compatible_p (TREE_TYPE (oprnd1), type))
662 return NULL;
663- stmt = last_stmt;
664+ stmt = *last_stmt;
665
666- if (widened_name_p (oprnd0, stmt, &half_type, &def_stmt))
667+ if (widened_name_p (oprnd0, stmt, &half_type, &def_stmt, true))
668 {
669 stmt = def_stmt;
670 oprnd0 = gimple_assign_rhs1 (stmt);
671@@ -293,10 +290,10 @@
672 if (!types_compatible_p (TREE_TYPE (oprnd0), prod_type)
673 || !types_compatible_p (TREE_TYPE (oprnd1), prod_type))
674 return NULL;
675- if (!widened_name_p (oprnd0, stmt, &half_type0, &def_stmt))
676+ if (!widened_name_p (oprnd0, stmt, &half_type0, &def_stmt, true))
677 return NULL;
678 oprnd00 = gimple_assign_rhs1 (def_stmt);
679- if (!widened_name_p (oprnd1, stmt, &half_type1, &def_stmt))
680+ if (!widened_name_p (oprnd1, stmt, &half_type1, &def_stmt, true))
681 return NULL;
682 oprnd01 = gimple_assign_rhs1 (def_stmt);
683 if (!types_compatible_p (half_type0, half_type1))
684@@ -322,7 +319,7 @@
685
686 /* We don't allow changing the order of the computation in the inner-loop
687 when doing outer-loop vectorization. */
688- gcc_assert (!nested_in_vect_loop_p (loop, last_stmt));
689+ gcc_assert (!nested_in_vect_loop_p (loop, *last_stmt));
690
691 return pattern_stmt;
692 }
693@@ -342,24 +339,47 @@
694
695 where type 'TYPE' is at least double the size of type 'type'.
696
697- Input:
698-
699- * LAST_STMT: A stmt from which the pattern search begins. In the example,
700- when this function is called with S5, the pattern {S3,S4,S5} is be detected.
701-
702- Output:
703-
704- * TYPE_IN: The type of the input arguments to the pattern.
705-
706- * TYPE_OUT: The type of the output of this pattern.
707-
708- * Return value: A new stmt that will be used to replace the sequence of
709- stmts that constitute the pattern. In this case it will be:
710- WIDEN_MULT <a_t, b_t>
711-*/
712+ Also detect unsgigned cases:
713+
714+ unsigned type a_t, b_t;
715+ unsigned TYPE u_prod_T;
716+ TYPE a_T, b_T, prod_T;
717+
718+ S1 a_t = ;
719+ S2 b_t = ;
720+ S3 a_T = (TYPE) a_t;
721+ S4 b_T = (TYPE) b_t;
722+ S5 prod_T = a_T * b_T;
723+ S6 u_prod_T = (unsigned TYPE) prod_T;
724+
725+ and multiplication by constants:
726+
727+ type a_t;
728+ TYPE a_T, prod_T;
729+
730+ S1 a_t = ;
731+ S3 a_T = (TYPE) a_t;
732+ S5 prod_T = a_T * CONST;
733+
734+ Input:
735+
736+ * LAST_STMT: A stmt from which the pattern search begins. In the example,
737+ when this function is called with S5, the pattern {S3,S4,S5,(S6)} is
738+ detected.
739+
740+ Output:
741+
742+ * TYPE_IN: The type of the input arguments to the pattern.
743+
744+ * TYPE_OUT: The type of the output of this pattern.
745+
746+ * Return value: A new stmt that will be used to replace the sequence of
747+ stmts that constitute the pattern. In this case it will be:
748+ WIDEN_MULT <a_t, b_t>
749+ */
750
751 static gimple
752-vect_recog_widen_mult_pattern (gimple last_stmt,
753+vect_recog_widen_mult_pattern (gimple *last_stmt,
754 tree *type_in,
755 tree *type_out)
756 {
757@@ -367,39 +387,112 @@
758 tree oprnd0, oprnd1;
759 tree type, half_type0, half_type1;
760 gimple pattern_stmt;
761- tree vectype, vectype_out;
762+ tree vectype, vectype_out = NULL_TREE;
763 tree dummy;
764 tree var;
765 enum tree_code dummy_code;
766 int dummy_int;
767 VEC (tree, heap) *dummy_vec;
768+ bool op0_ok, op1_ok;
769
770- if (!is_gimple_assign (last_stmt))
771+ if (!is_gimple_assign (*last_stmt))
772 return NULL;
773
774- type = gimple_expr_type (last_stmt);
775+ type = gimple_expr_type (*last_stmt);
776
777 /* Starting from LAST_STMT, follow the defs of its uses in search
778 of the above pattern. */
779
780- if (gimple_assign_rhs_code (last_stmt) != MULT_EXPR)
781+ if (gimple_assign_rhs_code (*last_stmt) != MULT_EXPR)
782 return NULL;
783
784- oprnd0 = gimple_assign_rhs1 (last_stmt);
785- oprnd1 = gimple_assign_rhs2 (last_stmt);
786+ oprnd0 = gimple_assign_rhs1 (*last_stmt);
787+ oprnd1 = gimple_assign_rhs2 (*last_stmt);
788 if (!types_compatible_p (TREE_TYPE (oprnd0), type)
789 || !types_compatible_p (TREE_TYPE (oprnd1), type))
790 return NULL;
791
792- /* Check argument 0 */
793- if (!widened_name_p (oprnd0, last_stmt, &half_type0, &def_stmt0))
794- return NULL;
795- oprnd0 = gimple_assign_rhs1 (def_stmt0);
796-
797- /* Check argument 1 */
798- if (!widened_name_p (oprnd1, last_stmt, &half_type1, &def_stmt1))
799- return NULL;
800- oprnd1 = gimple_assign_rhs1 (def_stmt1);
801+ /* Check argument 0. */
802+ op0_ok = widened_name_p (oprnd0, *last_stmt, &half_type0, &def_stmt0, false);
803+ /* Check argument 1. */
804+ op1_ok = widened_name_p (oprnd1, *last_stmt, &half_type1, &def_stmt1, false);
805+
806+ /* In case of multiplication by a constant one of the operands may not match
807+ the pattern, but not both. */
808+ if (!op0_ok && !op1_ok)
809+ return NULL;
810+
811+ if (op0_ok && op1_ok)
812+ {
813+ oprnd0 = gimple_assign_rhs1 (def_stmt0);
814+ oprnd1 = gimple_assign_rhs1 (def_stmt1);
815+ }
816+ else if (!op0_ok)
817+ {
818+ if (CONSTANT_CLASS_P (oprnd0)
819+ && TREE_CODE (half_type1) == INTEGER_TYPE
820+ && tree_int_cst_lt (oprnd0, TYPE_MAXVAL (half_type1))
821+ && tree_int_cst_lt (TYPE_MINVAL (half_type1), oprnd0))
822+ {
823+ /* OPRND0 is a constant of HALF_TYPE1. */
824+ half_type0 = half_type1;
825+ oprnd1 = gimple_assign_rhs1 (def_stmt1);
826+ }
827+ else
828+ return NULL;
829+ }
830+ else if (!op1_ok)
831+ {
832+ if (CONSTANT_CLASS_P (oprnd1)
833+ && TREE_CODE (half_type0) == INTEGER_TYPE
834+ && tree_int_cst_lt (oprnd1, TYPE_MAXVAL (half_type0))
835+ && tree_int_cst_lt (TYPE_MINVAL (half_type0), oprnd1))
836+ {
837+ /* OPRND1 is a constant of HALF_TYPE0. */
838+ half_type1 = half_type0;
839+ oprnd0 = gimple_assign_rhs1 (def_stmt0);
840+ }
841+ else
842+ return NULL;
843+ }
844+
845+ /* Handle unsigned case. Look for
846+ S6 u_prod_T = (unsigned TYPE) prod_T;
847+ Use unsigned TYPE as the type for WIDEN_MULT_EXPR. */
848+ if (TYPE_UNSIGNED (type) != TYPE_UNSIGNED (half_type0))
849+ {
850+ tree lhs = gimple_assign_lhs (*last_stmt), use_lhs;
851+ imm_use_iterator imm_iter;
852+ use_operand_p use_p;
853+ int nuses = 0;
854+ gimple use_stmt = NULL;
855+ tree use_type;
856+
857+ if (TYPE_UNSIGNED (type) == TYPE_UNSIGNED (half_type1))
858+ return NULL;
859+
860+ FOR_EACH_IMM_USE_FAST (use_p, imm_iter, lhs)
861+ {
862+ if (is_gimple_debug (USE_STMT (use_p)))
863+ continue;
864+ use_stmt = USE_STMT (use_p);
865+ nuses++;
866+ }
867+
868+ if (nuses != 1 || !is_gimple_assign (use_stmt)
869+ || gimple_assign_rhs_code (use_stmt) != NOP_EXPR)
870+ return NULL;
871+
872+ use_lhs = gimple_assign_lhs (use_stmt);
873+ use_type = TREE_TYPE (use_lhs);
874+ if (!INTEGRAL_TYPE_P (use_type)
875+ || (TYPE_UNSIGNED (type) == TYPE_UNSIGNED (use_type))
876+ || (TYPE_PRECISION (type) != TYPE_PRECISION (use_type)))
877+ return NULL;
878+
879+ type = use_type;
880+ *last_stmt = use_stmt;
881+ }
882
883 if (!types_compatible_p (half_type0, half_type1))
884 return NULL;
885@@ -413,7 +506,7 @@
886 vectype_out = get_vectype_for_scalar_type (type);
887 if (!vectype
888 || !vectype_out
889- || !supportable_widening_operation (WIDEN_MULT_EXPR, last_stmt,
890+ || !supportable_widening_operation (WIDEN_MULT_EXPR, *last_stmt,
891 vectype_out, vectype,
892 &dummy, &dummy, &dummy_code,
893 &dummy_code, &dummy_int, &dummy_vec))
894@@ -462,16 +555,16 @@
895 */
896
897 static gimple
898-vect_recog_pow_pattern (gimple last_stmt, tree *type_in, tree *type_out)
899+vect_recog_pow_pattern (gimple *last_stmt, tree *type_in, tree *type_out)
900 {
901 tree fn, base, exp = NULL;
902 gimple stmt;
903 tree var;
904
905- if (!is_gimple_call (last_stmt) || gimple_call_lhs (last_stmt) == NULL)
906+ if (!is_gimple_call (*last_stmt) || gimple_call_lhs (*last_stmt) == NULL)
907 return NULL;
908
909- fn = gimple_call_fndecl (last_stmt);
910+ fn = gimple_call_fndecl (*last_stmt);
911 if (fn == NULL_TREE || DECL_BUILT_IN_CLASS (fn) != BUILT_IN_NORMAL)
912 return NULL;
913
914@@ -481,8 +574,8 @@
915 case BUILT_IN_POWI:
916 case BUILT_IN_POWF:
917 case BUILT_IN_POW:
918- base = gimple_call_arg (last_stmt, 0);
919- exp = gimple_call_arg (last_stmt, 1);
920+ base = gimple_call_arg (*last_stmt, 0);
921+ exp = gimple_call_arg (*last_stmt, 1);
922 if (TREE_CODE (exp) != REAL_CST
923 && TREE_CODE (exp) != INTEGER_CST)
924 return NULL;
925@@ -574,21 +667,21 @@
926 inner-loop nested in an outer-loop that us being vectorized). */
927
928 static gimple
929-vect_recog_widen_sum_pattern (gimple last_stmt, tree *type_in, tree *type_out)
930+vect_recog_widen_sum_pattern (gimple *last_stmt, tree *type_in, tree *type_out)
931 {
932 gimple stmt;
933 tree oprnd0, oprnd1;
934- stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt);
935+ stmt_vec_info stmt_vinfo = vinfo_for_stmt (*last_stmt);
936 tree type, half_type;
937 gimple pattern_stmt;
938 loop_vec_info loop_info = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
939 struct loop *loop = LOOP_VINFO_LOOP (loop_info);
940 tree var;
941
942- if (!is_gimple_assign (last_stmt))
943+ if (!is_gimple_assign (*last_stmt))
944 return NULL;
945
946- type = gimple_expr_type (last_stmt);
947+ type = gimple_expr_type (*last_stmt);
948
949 /* Look for the following pattern
950 DX = (TYPE) X;
951@@ -600,25 +693,25 @@
952 /* Starting from LAST_STMT, follow the defs of its uses in search
953 of the above pattern. */
954
955- if (gimple_assign_rhs_code (last_stmt) != PLUS_EXPR)
956+ if (gimple_assign_rhs_code (*last_stmt) != PLUS_EXPR)
957 return NULL;
958
959 if (STMT_VINFO_DEF_TYPE (stmt_vinfo) != vect_reduction_def)
960 return NULL;
961
962- oprnd0 = gimple_assign_rhs1 (last_stmt);
963- oprnd1 = gimple_assign_rhs2 (last_stmt);
964+ oprnd0 = gimple_assign_rhs1 (*last_stmt);
965+ oprnd1 = gimple_assign_rhs2 (*last_stmt);
966 if (!types_compatible_p (TREE_TYPE (oprnd0), type)
967 || !types_compatible_p (TREE_TYPE (oprnd1), type))
968 return NULL;
969
970- /* So far so good. Since last_stmt was detected as a (summation) reduction,
971+ /* So far so good. Since *last_stmt was detected as a (summation) reduction,
972 we know that oprnd1 is the reduction variable (defined by a loop-header
973 phi), and oprnd0 is an ssa-name defined by a stmt in the loop body.
974 Left to check that oprnd0 is defined by a cast from type 'type' to type
975 'TYPE'. */
976
977- if (!widened_name_p (oprnd0, last_stmt, &half_type, &stmt))
978+ if (!widened_name_p (oprnd0, *last_stmt, &half_type, &stmt, true))
979 return NULL;
980
981 oprnd0 = gimple_assign_rhs1 (stmt);
982@@ -639,7 +732,7 @@
983
984 /* We don't allow changing the order of the computation in the inner-loop
985 when doing outer-loop vectorization. */
986- gcc_assert (!nested_in_vect_loop_p (loop, last_stmt));
987+ gcc_assert (!nested_in_vect_loop_p (loop, *last_stmt));
988
989 return pattern_stmt;
990 }
991@@ -669,23 +762,27 @@
992
993 static void
994 vect_pattern_recog_1 (
995- gimple (* vect_recog_func) (gimple, tree *, tree *),
996+ gimple (* vect_recog_func) (gimple *, tree *, tree *),
997 gimple_stmt_iterator si)
998 {
999 gimple stmt = gsi_stmt (si), pattern_stmt;
1000- stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
1001+ stmt_vec_info stmt_info;
1002 stmt_vec_info pattern_stmt_info;
1003- loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
1004+ loop_vec_info loop_vinfo;
1005 tree pattern_vectype;
1006 tree type_in, type_out;
1007 enum tree_code code;
1008 int i;
1009 gimple next;
1010
1011- pattern_stmt = (* vect_recog_func) (stmt, &type_in, &type_out);
1012+ pattern_stmt = (* vect_recog_func) (&stmt, &type_in, &type_out);
1013 if (!pattern_stmt)
1014 return;
1015
1016+ si = gsi_for_stmt (stmt);
1017+ stmt_info = vinfo_for_stmt (stmt);
1018+ loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
1019+
1020 if (VECTOR_MODE_P (TYPE_MODE (type_in)))
1021 {
1022 /* No need to check target support (already checked by the pattern
1023@@ -736,9 +833,9 @@
1024 }
1025
1026 /* Mark the stmts that are involved in the pattern. */
1027- gsi_insert_before (&si, pattern_stmt, GSI_SAME_STMT);
1028 set_vinfo_for_stmt (pattern_stmt,
1029 new_stmt_vec_info (pattern_stmt, loop_vinfo, NULL));
1030+ gimple_set_bb (pattern_stmt, gimple_bb (stmt));
1031 pattern_stmt_info = vinfo_for_stmt (pattern_stmt);
1032
1033 STMT_VINFO_RELATED_STMT (pattern_stmt_info) = stmt;
1034@@ -761,8 +858,8 @@
1035 LOOP_VINFO - a struct_loop_info of a loop in which we want to look for
1036 computation idioms.
1037
1038- Output - for each computation idiom that is detected we insert a new stmt
1039- that provides the same functionality and that can be vectorized. We
1040+ Output - for each computation idiom that is detected we create a new stmt
1041+ that provides the same functionality and that can be vectorized. We
1042 also record some information in the struct_stmt_info of the relevant
1043 stmts, as explained below:
1044
1045@@ -777,52 +874,48 @@
1046 S5: ... = ..use(a_0).. - - -
1047
1048 Say the sequence {S1,S2,S3,S4} was detected as a pattern that can be
1049- represented by a single stmt. We then:
1050- - create a new stmt S6 that will replace the pattern.
1051- - insert the new stmt S6 before the last stmt in the pattern
1052+ represented by a single stmt. We then:
1053+ - create a new stmt S6 equivalent to the pattern (the stmt is not
1054+ inserted into the code)
1055 - fill in the STMT_VINFO fields as follows:
1056
1057 in_pattern_p related_stmt vec_stmt
1058 S1: a_i = .... - - -
1059 S2: a_2 = ..use(a_i).. - - -
1060 S3: a_1 = ..use(a_2).. - - -
1061- > S6: a_new = .... - S4 -
1062 S4: a_0 = ..use(a_1).. true S6 -
1063+ '---> S6: a_new = .... - S4 -
1064 S5: ... = ..use(a_0).. - - -
1065
1066 (the last stmt in the pattern (S4) and the new pattern stmt (S6) point
1067- to each other through the RELATED_STMT field).
1068+ to each other through the RELATED_STMT field).
1069
1070 S6 will be marked as relevant in vect_mark_stmts_to_be_vectorized instead
1071 of S4 because it will replace all its uses. Stmts {S1,S2,S3} will
1072 remain irrelevant unless used by stmts other than S4.
1073
1074 If vectorization succeeds, vect_transform_stmt will skip over {S1,S2,S3}
1075- (because they are marked as irrelevant). It will vectorize S6, and record
1076+ (because they are marked as irrelevant). It will vectorize S6, and record
1077 a pointer to the new vector stmt VS6 both from S6 (as usual), and also
1078- from S4. We do that so that when we get to vectorizing stmts that use the
1079+ from S4. We do that so that when we get to vectorizing stmts that use the
1080 def of S4 (like S5 that uses a_0), we'll know where to take the relevant
1081- vector-def from. S4 will be skipped, and S5 will be vectorized as usual:
1082+ vector-def from. S4 will be skipped, and S5 will be vectorized as usual:
1083
1084 in_pattern_p related_stmt vec_stmt
1085 S1: a_i = .... - - -
1086 S2: a_2 = ..use(a_i).. - - -
1087 S3: a_1 = ..use(a_2).. - - -
1088 > VS6: va_new = .... - - -
1089- S6: a_new = .... - S4 VS6
1090 S4: a_0 = ..use(a_1).. true S6 VS6
1091+ '---> S6: a_new = .... - S4 VS6
1092 > VS5: ... = ..vuse(va_new).. - - -
1093 S5: ... = ..use(a_0).. - - -
1094
1095- DCE could then get rid of {S1,S2,S3,S4,S5,S6} (if their defs are not used
1096+ DCE could then get rid of {S1,S2,S3,S4,S5} (if their defs are not used
1097 elsewhere), and we'll end up with:
1098
1099 VS6: va_new = ....
1100- VS5: ... = ..vuse(va_new)..
1101-
1102- If vectorization does not succeed, DCE will clean S6 away (its def is
1103- not used), and we'll end up with the original sequence.
1104-*/
1105+ VS5: ... = ..vuse(va_new).. */
1106
1107 void
1108 vect_pattern_recog (loop_vec_info loop_vinfo)
1109@@ -832,7 +925,7 @@
1110 unsigned int nbbs = loop->num_nodes;
1111 gimple_stmt_iterator si;
1112 unsigned int i, j;
1113- gimple (* vect_recog_func_ptr) (gimple, tree *, tree *);
1114+ gimple (* vect_recog_func_ptr) (gimple *, tree *, tree *);
1115
1116 if (vect_print_dump_info (REPORT_DETAILS))
1117 fprintf (vect_dump, "=== vect_pattern_recog ===");
1118
1119=== modified file 'gcc/tree-vect-slp.c'
1120--- gcc/tree-vect-slp.c 2011-05-05 15:43:06 +0000
1121+++ gcc/tree-vect-slp.c 2011-06-22 12:20:57 +0000
1122@@ -2510,6 +2510,8 @@
1123 && STMT_VINFO_STRIDED_ACCESS (stmt_info)
1124 && !REFERENCE_CLASS_P (gimple_get_lhs (stmt)))
1125 si = gsi_for_stmt (SLP_INSTANCE_FIRST_LOAD_STMT (instance));
1126+ else if (is_pattern_stmt_p (stmt_info))
1127+ si = gsi_for_stmt (STMT_VINFO_RELATED_STMT (stmt_info));
1128 else
1129 si = gsi_for_stmt (stmt);
1130
1131
1132=== modified file 'gcc/tree-vect-stmts.c'
1133--- gcc/tree-vect-stmts.c 2011-06-02 12:12:00 +0000
1134+++ gcc/tree-vect-stmts.c 2011-06-22 12:20:57 +0000
1135@@ -605,15 +605,76 @@
1136 break;
1137 }
1138
1139- FOR_EACH_PHI_OR_STMT_USE (use_p, stmt, iter, SSA_OP_USE)
1140- {
1141- tree op = USE_FROM_PTR (use_p);
1142- if (!process_use (stmt, op, loop_vinfo, live_p, relevant, &worklist))
1143- {
1144- VEC_free (gimple, heap, worklist);
1145- return false;
1146- }
1147- }
1148+ if (is_pattern_stmt_p (vinfo_for_stmt (stmt)))
1149+ {
1150+ /* Pattern statements are not inserted into the code, so
1151+ FOR_EACH_PHI_OR_STMT_USE optimizes their operands out, and we
1152+ have to scan the RHS or function arguments instead. */
1153+ if (is_gimple_assign (stmt))
1154+ {
1155+ tree rhs = gimple_assign_rhs1 (stmt);
1156+ if (get_gimple_rhs_class (gimple_assign_rhs_code (stmt))
1157+ == GIMPLE_SINGLE_RHS)
1158+ {
1159+ unsigned int op_num = TREE_OPERAND_LENGTH (gimple_assign_rhs1
1160+ (stmt));
1161+ for (i = 0; i < op_num; i++)
1162+ {
1163+ tree op = TREE_OPERAND (rhs, i);
1164+ if (!process_use (stmt, op, loop_vinfo, live_p, relevant,
1165+ &worklist))
1166+ {
1167+ VEC_free (gimple, heap, worklist);
1168+ return false;
1169+ }
1170+ }
1171+ }
1172+ else if (get_gimple_rhs_class (gimple_assign_rhs_code (stmt))
1173+ == GIMPLE_BINARY_RHS)
1174+ {
1175+ tree op = gimple_assign_rhs1 (stmt);
1176+ if (!process_use (stmt, op, loop_vinfo, live_p, relevant,
1177+ &worklist))
1178+ {
1179+ VEC_free (gimple, heap, worklist);
1180+ return false;
1181+ }
1182+ op = gimple_assign_rhs2 (stmt);
1183+ if (!process_use (stmt, op, loop_vinfo, live_p, relevant,
1184+ &worklist))
1185+ {
1186+ VEC_free (gimple, heap, worklist);
1187+ return false;
1188+ }
1189+ }
1190+ else
1191+ return false;
1192+ }
1193+ else if (is_gimple_call (stmt))
1194+ {
1195+ for (i = 0; i < gimple_call_num_args (stmt); i++)
1196+ {
1197+ tree arg = gimple_call_arg (stmt, i);
1198+ if (!process_use (stmt, arg, loop_vinfo, live_p, relevant,
1199+ &worklist))
1200+ {
1201+ VEC_free (gimple, heap, worklist);
1202+ return false;
1203+ }
1204+ }
1205+ }
1206+ }
1207+ else
1208+ FOR_EACH_PHI_OR_STMT_USE (use_p, stmt, iter, SSA_OP_USE)
1209+ {
1210+ tree op = USE_FROM_PTR (use_p);
1211+ if (!process_use (stmt, op, loop_vinfo, live_p, relevant,
1212+ &worklist))
1213+ {
1214+ VEC_free (gimple, heap, worklist);
1215+ return false;
1216+ }
1217+ }
1218 } /* while worklist */
1219
1220 VEC_free (gimple, heap, worklist);
1221@@ -1405,6 +1466,7 @@
1222 VEC(tree, heap) *vargs = NULL;
1223 enum { NARROW, NONE, WIDEN } modifier;
1224 size_t i, nargs;
1225+ tree lhs;
1226
1227 /* FORNOW: unsupported in basic block SLP. */
1228 gcc_assert (loop_vinfo);
1229@@ -1542,7 +1604,7 @@
1230 /** Transform. **/
1231
1232 if (vect_print_dump_info (REPORT_DETAILS))
1233- fprintf (vect_dump, "transform operation.");
1234+ fprintf (vect_dump, "transform call.");
1235
1236 /* Handle def. */
1237 scalar_dest = gimple_call_lhs (stmt);
1238@@ -1661,8 +1723,11 @@
1239 rhs of the statement with something harmless. */
1240
1241 type = TREE_TYPE (scalar_dest);
1242- new_stmt = gimple_build_assign (gimple_call_lhs (stmt),
1243- build_zero_cst (type));
1244+ if (is_pattern_stmt_p (stmt_info))
1245+ lhs = gimple_call_lhs (STMT_VINFO_RELATED_STMT (stmt_info));
1246+ else
1247+ lhs = gimple_call_lhs (stmt);
1248+ new_stmt = gimple_build_assign (lhs, build_zero_cst (type));
1249 set_vinfo_for_stmt (new_stmt, stmt_info);
1250 set_vinfo_for_stmt (stmt, NULL);
1251 STMT_VINFO_STMT (stmt_info) = new_stmt;
1252@@ -3231,6 +3296,33 @@
1253 fprintf (vect_dump, "use not simple.");
1254 return false;
1255 }
1256+
1257+ op_type = TREE_CODE_LENGTH (code);
1258+ if (op_type == binary_op)
1259+ {
1260+ bool ok;
1261+
1262+ op1 = gimple_assign_rhs2 (stmt);
1263+ if (code == WIDEN_MULT_EXPR)
1264+ {
1265+ /* For WIDEN_MULT_EXPR, if OP0 is a constant, use the type of
1266+ OP1. */
1267+ if (CONSTANT_CLASS_P (op0))
1268+ ok = vect_is_simple_use_1 (op1, loop_vinfo, NULL,
1269+ &def_stmt, &def, &dt[1], &vectype_in);
1270+ else
1271+ ok = vect_is_simple_use (op1, loop_vinfo, NULL, &def_stmt, &def,
1272+ &dt[1]);
1273+
1274+ if (!ok)
1275+ {
1276+ if (vect_print_dump_info (REPORT_DETAILS))
1277+ fprintf (vect_dump, "use not simple.");
1278+ return false;
1279+ }
1280+ }
1281+ }
1282+
1283 /* If op0 is an external or constant def use a vector type with
1284 the same size as the output vector type. */
1285 if (!vectype_in)
1286@@ -3263,18 +3355,6 @@
1287
1288 gcc_assert (ncopies >= 1);
1289
1290- op_type = TREE_CODE_LENGTH (code);
1291- if (op_type == binary_op)
1292- {
1293- op1 = gimple_assign_rhs2 (stmt);
1294- if (!vect_is_simple_use (op1, loop_vinfo, NULL, &def_stmt, &def, &dt[1]))
1295- {
1296- if (vect_print_dump_info (REPORT_DETAILS))
1297- fprintf (vect_dump, "use not simple.");
1298- return false;
1299- }
1300- }
1301-
1302 /* Supportable by target? */
1303 if (!supportable_widening_operation (code, stmt, vectype_out, vectype_in,
1304 &decl1, &decl2, &code1, &code2,
1305@@ -3300,6 +3380,14 @@
1306 fprintf (vect_dump, "transform type promotion operation. ncopies = %d.",
1307 ncopies);
1308
1309+ if (code == WIDEN_MULT_EXPR)
1310+ {
1311+ if (CONSTANT_CLASS_P (op0))
1312+ op0 = fold_convert (TREE_TYPE (op1), op0);
1313+ else if (CONSTANT_CLASS_P (op1))
1314+ op1 = fold_convert (TREE_TYPE (op0), op1);
1315+ }
1316+
1317 /* Handle def. */
1318 /* In case of multi-step promotion, we first generate promotion operations
1319 to the intermediate types, and then from that types to the final one.
1320@@ -4824,10 +4912,26 @@
1321 if (!STMT_VINFO_RELEVANT_P (stmt_info)
1322 && !STMT_VINFO_LIVE_P (stmt_info))
1323 {
1324- if (vect_print_dump_info (REPORT_DETAILS))
1325- fprintf (vect_dump, "irrelevant.");
1326+ gimple pattern_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
1327+ if (STMT_VINFO_IN_PATTERN_P (stmt_info)
1328+ && (STMT_VINFO_RELEVANT_P (vinfo_for_stmt (pattern_stmt))
1329+ || STMT_VINFO_LIVE_P (vinfo_for_stmt (pattern_stmt))))
1330+ {
1331+ stmt = pattern_stmt;
1332+ stmt_info = vinfo_for_stmt (pattern_stmt);
1333+ if (vect_print_dump_info (REPORT_DETAILS))
1334+ {
1335+ fprintf (vect_dump, "==> examining pattern statement: ");
1336+ print_gimple_stmt (vect_dump, stmt, 0, TDF_SLIM);
1337+ }
1338+ }
1339+ else
1340+ {
1341+ if (vect_print_dump_info (REPORT_DETAILS))
1342+ fprintf (vect_dump, "irrelevant.");
1343
1344- return true;
1345+ return true;
1346+ }
1347 }
1348
1349 switch (STMT_VINFO_DEF_TYPE (stmt_info))
1350
1351=== modified file 'gcc/tree-vectorizer.h'
1352--- gcc/tree-vectorizer.h 2011-05-05 15:43:06 +0000
1353+++ gcc/tree-vectorizer.h 2011-06-22 12:20:57 +0000
1354@@ -884,7 +884,7 @@
1355 /* Pattern recognition functions.
1356 Additional pattern recognition functions can (and will) be added
1357 in the future. */
1358-typedef gimple (* vect_recog_func_ptr) (gimple, tree *, tree *);
1359+typedef gimple (* vect_recog_func_ptr) (gimple *, tree *, tree *);
1360 #define NUM_PATTERNS 4
1361 void vect_pattern_recog (loop_vec_info);
1362

Subscribers

People subscribed via source and target branches