Merge lp:~ams-codesourcery/gcc-linaro/maddhidi4-4.6 into lp:gcc-linaro/4.6

Proposed by Andrew Stubbs
Status: Superseded
Proposed branch: lp:~ams-codesourcery/gcc-linaro/maddhidi4-4.6
Merge into: lp:gcc-linaro/4.6
Diff against target: 346 lines (+260/-0) (has conflicts)
9 files modified
ChangeLog.linaro (+35/-0)
gcc/config/arm/arm.md (+63/-0)
gcc/doc/md.texi (+17/-0)
gcc/simplify-rtx.c (+84/-0)
gcc/testsuite/gcc.target/arm/mla-2.c (+9/-0)
gcc/testsuite/gcc.target/arm/smlaltb-1.c (+13/-0)
gcc/testsuite/gcc.target/arm/smlaltt-1.c (+13/-0)
gcc/testsuite/gcc.target/arm/smlatb-1.c (+13/-0)
gcc/testsuite/gcc.target/arm/smlatt-1.c (+13/-0)
Text conflict in ChangeLog.linaro
To merge this branch: bzr merge lp:~ams-codesourcery/gcc-linaro/maddhidi4-4.6
Reviewer Review Type Date Requested Status
Linaro Toolchain Builder Needs Fixing
Review via email: mp+61410@code.launchpad.net

This proposal has been superseded by a proposal from 2011-06-02.

Description of the change

A target-independent patch for improving combine of HImode to DImode mulitply-and-accumulate.

I've posted this patch upstream here:

http://<email address hidden>/msg05794.html

I'm waiting for upstream review, so I've submitted this merge proposal mostly to get the patch tested.

To post a comment you must log in.
Revision history for this message
Linaro Toolchain Builder (cbuild) wrote :

cbuild has taken a snapshot of this branch at r106750 and queued it for build.

The snapshot is available at:
 http://ex.seabright.co.nz/snapshots/gcc-linaro-4.6+bzr106750~ams-codesourcery~maddhidi4-4.6.tar.xdelta3.xz

and will be built on the following builders:
 a9-builder i686 x86_64

You can track the build queue at:
 http://ex.seabright.co.nz/helpers/scheduler

cbuild-snapshot: gcc-linaro-4.6+bzr106750~ams-codesourcery~maddhidi4-4.6
cbuild-ancestor: lp:gcc-linaro/4.6+bzr106749
cbuild-state: check

Revision history for this message
Linaro Toolchain Builder (cbuild) wrote :

cbuild had trouble building this on i686-lucid-cbuild117-scorpius-i686r1.
See the following failure logs:
 failed.txt gcc-build-failed.txt

under the build results at:
 http://ex.seabright.co.nz/build/gcc-linaro-4.6+bzr106750~ams-codesourcery~maddhidi4-4.6/logs/i686-lucid-cbuild117-scorpius-i686r1

The test suite was not checked as this build has no .sum style test results

cbuild-checked: i686-lucid-cbuild117-scorpius-i686r1

review: Needs Fixing
Revision history for this message
Linaro Toolchain Builder (cbuild) wrote :

cbuild had trouble building this on x86_64-maverick-cbuild117-crucis-x86_64r1.
See the following failure logs:
 failed.txt gcc-build-failed.txt

under the build results at:
 http://ex.seabright.co.nz/build/gcc-linaro-4.6+bzr106750~ams-codesourcery~maddhidi4-4.6/logs/x86_64-maverick-cbuild117-crucis-x86_64r1

The test suite was not checked as this build has no .sum style test results

cbuild-checked: x86_64-maverick-cbuild117-crucis-x86_64r1

review: Needs Fixing
Revision history for this message
Michael Hope (michaelh1) wrote :

The i686 and x86_64 build show similar errors so the fault is probably real.

Revision history for this message
Linaro Toolchain Builder (cbuild) wrote :
Download full text (3.4 KiB)

cbuild successfully built this on armv7l-maverick-cbuild116-ursa4-cortexa9r1.

The build results are available at:
 http://ex.seabright.co.nz/build/gcc-linaro-4.6+bzr106750~ams-codesourcery~maddhidi4-4.6/logs/armv7l-maverick-cbuild116-ursa4-cortexa9r1

The test suite results changed compared to the branch point lp:gcc-linaro/4.6+bzr106749:
 -PASS: gcc.dg/range-test-1.c execution test
 +FAIL: gcc.dg/range-test-1.c execution test
 -PASS: gcc.dg/torture/pr43017.c -O2 execution test
 -PASS: gcc.dg/torture/pr43017.c -O2 -flto execution test
 -PASS: gcc.dg/torture/pr43017.c -O2 -flto -flto-partition=none execution test
 -PASS: gcc.dg/torture/pr43017.c -O2 -flto -flto-partition=none (test for excess errors)
 -PASS: gcc.dg/torture/pr43017.c -O2 -flto (test for excess errors)
 -PASS: gcc.dg/torture/pr43017.c -O2 (test for excess errors)
 -PASS: gcc.dg/torture/pr43017.c -O3 -fomit-frame-pointer execution test
 -PASS: gcc.dg/torture/pr43017.c -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions execution test
 +UNRESOLVED: gcc.dg/torture/pr43017.c -O2 compilation failed to produce executable
 +UNRESOLVED: gcc.dg/torture/pr43017.c -O2 -flto compilation failed to produce executable
 +UNRESOLVED: gcc.dg/torture/pr43017.c -O2 -flto -flto-partition=none compilation failed to produce executable
 +FAIL: gcc.dg/torture/pr43017.c -O2 -flto -flto-partition=none (internal compiler error)
 +FAIL: gcc.dg/torture/pr43017.c -O2 -flto -flto-partition=none (test for excess errors)
 +FAIL: gcc.dg/torture/pr43017.c -O2 -flto (internal compiler error)
 +FAIL: gcc.dg/torture/pr43017.c -O2 -flto (test for excess errors)
 +FAIL: gcc.dg/torture/pr43017.c -O2 (internal compiler error)
 +FAIL: gcc.dg/torture/pr43017.c -O2 (test for excess errors)
 +FAIL: gcc.dg/torture/pr43017.c -O3 -fomit-frame-pointer execution test
 +FAIL: gcc.dg/torture/pr43017.c -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions execution test
 -PASS: gcc.dg/torture/pr43017.c -O3 -fomit-frame-pointer -funroll-loops execution test
 +FAIL: gcc.dg/torture/pr43017.c -O3 -fomit-frame-pointer -funroll-loops execution test
 -PASS: gcc.dg/torture/pr43017.c -O3 -g execution test
 +FAIL: gcc.dg/torture/pr43017.c -O3 -g execution test
 -PASS: gcc.dg/torture/pr43017.c -Os execution test
 -PASS: gcc.dg/torture/pr43017.c -Os (test for excess errors)
 +UNRESOLVED: gcc.dg/torture/pr43017.c -Os compilation failed to produce executable
 +FAIL: gcc.dg/torture/pr43017.c -Os (internal compiler error)
 +FAIL: gcc.dg/torture/pr43017.c -Os (test for excess errors)
 -PASS: gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c execution test
 +UNRESOLVED: gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c compilation failed to produce executable
 +FAIL: gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c (internal compiler error)
 -PASS: gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c (test for excess errors)
 +FAIL: gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c (test for excess errors)
 -PASS: gcc.dg/vect/pr20122.c execution test
 -PASS: gcc.dg/vect/pr20122.c scan-tree-dump-times vect "vectorized 1 loops" 3
 -PASS: gcc.dg/vect/pr20122.c (test for excess errors)
 +UNRESOLVED: gcc.dg/vect/pr20122...

Read more...

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'ChangeLog.linaro'
2--- ChangeLog.linaro 2011-06-02 12:12:00 +0000
3+++ ChangeLog.linaro 2011-06-02 16:19:50 +0000
4@@ -1,3 +1,4 @@
5+<<<<<<< TREE
6 2001-06-02 Richard Sandiford <richard.sandiford@linaro.org>
7
8 gcc/
9@@ -336,6 +337,40 @@
10 * config/arm/arm.h (CANNOT_CHANGE_MODE_CLASS): Restrict FPA_REGS
11 case to VFPv1.
12
13+=======
14+2011-06-02 Andrew Stubbs <ams@codesourcery.com>
15+
16+ Backport of patch proposed for FSF:
17+
18+ 2011-05-27 Andrew Stubbs <ams@codesourcery.com>
19+
20+ gcc/
21+ * config/arm/arm.md (*maddhidi4tb, *maddhidi4tt): New define_insns.
22+ (*maddhisi4tb, *maddhisi4tt): New define_insns.
23+
24+ gcc/testsuite/
25+ * gcc.target/arm/smlatb-1.c: New file.
26+ * gcc.target/arm/smlatt-1.c: New file.
27+ * gcc.target/arm/smlaltb-1.c: New file.
28+ * gcc.target/arm/smlaltt-1.c: New file.
29+
30+2011-06-02 Andrew Stubbs <ams@codesourcery.com>
31+
32+ Backport of patch proposed for FSF:
33+
34+ 2011-05-26 Bernd Schmidt <bernds@codesourcery.com>
35+ Andrew Stubbs <ams@codesourcery.com>
36+
37+ gcc/
38+ * simplify-rtx.c (simplify_unary_operation_1): Canonicalize widening
39+ multiplies.
40+ * doc/md.texi (Canonicalization of Instructions): Document widening
41+ multiply canonicalization.
42+
43+ gcc/testsuite/
44+ * gcc.target/arm/mla-2.c: New test.
45+
46+>>>>>>> MERGE-SOURCE
47 2011-05-26 Andrew Stubbs <ams@codesourcery.com>
48
49 Merge from FSF GCC 4.6 (svn branches/gcc-4_6-branch 174261).
50
51=== modified file 'gcc/config/arm/arm.md'
52--- gcc/config/arm/arm.md 2011-05-13 13:42:39 +0000
53+++ gcc/config/arm/arm.md 2011-06-02 16:19:50 +0000
54@@ -1809,6 +1809,36 @@
55 (set_attr "predicable" "yes")]
56 )
57
58+;; Note: there is no maddhisi4ibt because this one is canonical form
59+(define_insn "*maddhisi4tb"
60+ [(set (match_operand:SI 0 "s_register_operand" "=r")
61+ (plus:SI (mult:SI (ashiftrt:SI
62+ (match_operand:SI 1 "s_register_operand" "r")
63+ (const_int 16))
64+ (sign_extend:SI
65+ (match_operand:HI 2 "s_register_operand" "r")))
66+ (match_operand:SI 3 "s_register_operand" "r")))]
67+ "TARGET_DSP_MULTIPLY"
68+ "smlatb%?\\t%0, %1, %2, %3"
69+ [(set_attr "insn" "smlaxy")
70+ (set_attr "predicable" "yes")]
71+)
72+
73+(define_insn "*maddhisi4tt"
74+ [(set (match_operand:SI 0 "s_register_operand" "=r")
75+ (plus:SI (mult:SI (ashiftrt:SI
76+ (match_operand:SI 1 "s_register_operand" "r")
77+ (const_int 16))
78+ (ashiftrt:SI
79+ (match_operand:SI 2 "s_register_operand" "r")
80+ (const_int 16)))
81+ (match_operand:SI 3 "s_register_operand" "r")))]
82+ "TARGET_DSP_MULTIPLY"
83+ "smlatt%?\\t%0, %1, %2, %3"
84+ [(set_attr "insn" "smlaxy")
85+ (set_attr "predicable" "yes")]
86+)
87+
88 (define_insn "*maddhidi4"
89 [(set (match_operand:DI 0 "s_register_operand" "=r")
90 (plus:DI
91@@ -1822,6 +1852,39 @@
92 [(set_attr "insn" "smlalxy")
93 (set_attr "predicable" "yes")])
94
95+;; Note: there is no maddhidi4ibt because this one is canonical form
96+(define_insn "*maddhidi4tb"
97+ [(set (match_operand:DI 0 "s_register_operand" "=r")
98+ (plus:DI
99+ (mult:DI (sign_extend:DI
100+ (ashiftrt:SI
101+ (match_operand:SI 1 "s_register_operand" "r")
102+ (const_int 16)))
103+ (sign_extend:DI
104+ (match_operand:HI 2 "s_register_operand" "r")))
105+ (match_operand:DI 3 "s_register_operand" "0")))]
106+ "TARGET_DSP_MULTIPLY"
107+ "smlaltb%?\\t%Q0, %R0, %1, %2"
108+ [(set_attr "insn" "smlalxy")
109+ (set_attr "predicable" "yes")])
110+
111+(define_insn "*maddhidi4tt"
112+ [(set (match_operand:DI 0 "s_register_operand" "=r")
113+ (plus:DI
114+ (mult:DI (sign_extend:DI
115+ (ashiftrt:SI
116+ (match_operand:SI 1 "s_register_operand" "r")
117+ (const_int 16)))
118+ (sign_extend:DI
119+ (ashiftrt:SI
120+ (match_operand:SI 2 "s_register_operand" "r")
121+ (const_int 16))))
122+ (match_operand:DI 3 "s_register_operand" "0")))]
123+ "TARGET_DSP_MULTIPLY"
124+ "smlaltt%?\\t%Q0, %R0, %1, %2"
125+ [(set_attr "insn" "smlalxy")
126+ (set_attr "predicable" "yes")])
127+
128 (define_expand "mulsf3"
129 [(set (match_operand:SF 0 "s_register_operand" "")
130 (mult:SF (match_operand:SF 1 "s_register_operand" "")
131
132=== modified file 'gcc/doc/md.texi'
133--- gcc/doc/md.texi 2011-05-05 15:43:06 +0000
134+++ gcc/doc/md.texi 2011-06-02 16:19:50 +0000
135@@ -5929,6 +5929,23 @@
136 will be written using @code{zero_extract} rather than the equivalent
137 @code{and} or @code{sign_extract} operations.
138
139+@cindex @code{mult}, canonicalization of
140+@item
141+@code{(sign_extend:@var{m1} (mult:@var{m2} (sign_extend:@var{m2} @var{x})
142+(sign_extend:@var{m2} @var{y})))} is converted to @code{(mult:@var{m1}
143+(sign_extend:@var{m1} @var{x}) (sign_extend:@var{m1} @var{y}))}, and likewise
144+for @code{zero_extend}.
145+
146+@item
147+@code{(sign_extend:@var{m1} (mult:@var{m2} (ashiftrt:@var{m2}
148+@var{x} @var{s}) (sign_extend:@var{m2} @var{y})))} is converted
149+to @code{(mult:@var{m1} (sign_extend:@var{m1} (ashiftrt:@var{m2}
150+@var{x} @var{s})) (sign_extend:@var{m1} @var{y}))}, and likewise for
151+patterns using @code{zero_extend} and @code{lshiftrt}. If the second
152+operand of @code{mult} is also a shift, then that is extended also.
153+This transformation is only applied when it can be proven that the
154+original operation had sufficient precision to prevent overflow.
155+
156 @end itemize
157
158 Further canonicalization rules are defined in the function
159
160=== modified file 'gcc/simplify-rtx.c'
161--- gcc/simplify-rtx.c 2011-05-27 14:31:18 +0000
162+++ gcc/simplify-rtx.c 2011-06-02 16:19:50 +0000
163@@ -1000,6 +1000,48 @@
164 && GET_CODE (XEXP (XEXP (op, 0), 1)) == LABEL_REF)
165 return XEXP (op, 0);
166
167+ /* Extending a widening multiplication should be canonicalized to
168+ a wider widening multiplication. */
169+ if (GET_CODE (op) == MULT)
170+ {
171+ rtx lhs = XEXP (op, 0);
172+ rtx rhs = XEXP (op, 1);
173+ enum rtx_code lcode = GET_CODE (lhs);
174+ enum rtx_code rcode = GET_CODE (rhs);
175+
176+ /* Widening multiplies usually extend both operands, but sometimes
177+ they use a shift to extract a portion of a register. */
178+ if ((lcode == SIGN_EXTEND
179+ || (lcode == ASHIFTRT && CONST_INT_P (XEXP (lhs, 1))))
180+ && (rcode == SIGN_EXTEND
181+ || (rcode == ASHIFTRT && CONST_INT_P (XEXP (rhs, 1)))))
182+ {
183+ enum machine_mode lmode = GET_MODE (lhs);
184+ enum machine_mode rmode = GET_MODE (rhs);
185+ int bits;
186+
187+ if (lcode == ASHIFTRT)
188+ /* Number of bits not shifted off the end. */
189+ bits = GET_MODE_PRECISION (lmode) - INTVAL (XEXP (lhs, 1));
190+ else /* lcode == SIGN_EXTEND */
191+ /* Size of inner mode. */
192+ bits = GET_MODE_PRECISION (GET_MODE (XEXP (lhs, 0)));
193+
194+ if (rcode == ASHIFTRT)
195+ bits += GET_MODE_PRECISION (rmode) - INTVAL (XEXP (rhs, 1));
196+ else /* rcode == SIGN_EXTEND */
197+ bits += GET_MODE_PRECISION (GET_MODE (XEXP (rhs, 0)));
198+
199+ /* We can only widen multiplies if the result is mathematiclly
200+ equivalent. I.e. if overflow was impossible. */
201+ if (bits <= GET_MODE_PRECISION (GET_MODE (op)))
202+ return simplify_gen_binary
203+ (MULT, mode,
204+ simplify_gen_unary (SIGN_EXTEND, mode, lhs, lmode),
205+ simplify_gen_unary (SIGN_EXTEND, mode, rhs, rmode));
206+ }
207+ }
208+
209 /* Check for a sign extension of a subreg of a promoted
210 variable, where the promotion is sign-extended, and the
211 target mode is the same as the variable's promotion. */
212@@ -1071,6 +1113,48 @@
213 && GET_MODE_SIZE (mode) <= GET_MODE_SIZE (GET_MODE (XEXP (op, 0))))
214 return rtl_hooks.gen_lowpart_no_emit (mode, op);
215
216+ /* Extending a widening multiplication should be canonicalized to
217+ a wider widening multiplication. */
218+ if (GET_CODE (op) == MULT)
219+ {
220+ rtx lhs = XEXP (op, 0);
221+ rtx rhs = XEXP (op, 1);
222+ enum rtx_code lcode = GET_CODE (lhs);
223+ enum rtx_code rcode = GET_CODE (rhs);
224+
225+ /* Widening multiplies usually extend both operands, but sometimes
226+ they use a shift to extract a portion of a register. */
227+ if ((lcode == ZERO_EXTEND
228+ || (lcode == LSHIFTRT && CONST_INT_P (XEXP (lhs, 1))))
229+ && (rcode == ZERO_EXTEND
230+ || (rcode == LSHIFTRT && CONST_INT_P (XEXP (rhs, 1)))))
231+ {
232+ enum machine_mode lmode = GET_MODE (lhs);
233+ enum machine_mode rmode = GET_MODE (rhs);
234+ int bits;
235+
236+ if (lcode == LSHIFTRT)
237+ /* Number of bits not shifted off the end. */
238+ bits = GET_MODE_PRECISION (lmode) - INTVAL (XEXP (lhs, 1));
239+ else /* lcode == ZERO_EXTEND */
240+ /* Size of inner mode. */
241+ bits = GET_MODE_PRECISION (GET_MODE (XEXP (lhs, 0)));
242+
243+ if (rcode == LSHIFTRT)
244+ bits += GET_MODE_PRECISION (rmode) - INTVAL (XEXP (rhs, 1));
245+ else /* rcode == ZERO_EXTEND */
246+ bits += GET_MODE_PRECISION (GET_MODE (XEXP (rhs, 0)));
247+
248+ /* We can only widen multiplies if the result is mathematiclly
249+ equivalent. I.e. if overflow was impossible. */
250+ if (bits <= GET_MODE_PRECISION (GET_MODE (op)))
251+ return simplify_gen_binary
252+ (MULT, mode,
253+ simplify_gen_unary (ZERO_EXTEND, mode, lhs, lmode),
254+ simplify_gen_unary (ZERO_EXTEND, mode, rhs, rmode));
255+ }
256+ }
257+
258 /* (zero_extend:M (zero_extend:N <X>)) is (zero_extend:M <X>). */
259 if (GET_CODE (op) == ZERO_EXTEND)
260 return simplify_gen_unary (ZERO_EXTEND, mode, XEXP (op, 0),
261
262=== added file 'gcc/testsuite/gcc.target/arm/mla-2.c'
263--- gcc/testsuite/gcc.target/arm/mla-2.c 1970-01-01 00:00:00 +0000
264+++ gcc/testsuite/gcc.target/arm/mla-2.c 2011-06-02 16:19:50 +0000
265@@ -0,0 +1,9 @@
266+/* { dg-do compile } */
267+/* { dg-options "-O2 -march=armv7-a" } */
268+
269+long long foolong (long long x, short *a, short *b)
270+{
271+ return x + *a * *b;
272+}
273+
274+/* { dg-final { scan-assembler "smlalbb" } } */
275
276=== added file 'gcc/testsuite/gcc.target/arm/smlaltb-1.c'
277--- gcc/testsuite/gcc.target/arm/smlaltb-1.c 1970-01-01 00:00:00 +0000
278+++ gcc/testsuite/gcc.target/arm/smlaltb-1.c 2011-06-02 16:19:50 +0000
279@@ -0,0 +1,13 @@
280+/* { dg-do compile } */
281+/* { dg-options "-O2 -march=armv7-a" } */
282+
283+long long int
284+foo (long long x, int in)
285+{
286+ short a = in & 0xffff;
287+ short b = (in & 0xffff0000) >> 16;
288+
289+ return x + b * a;
290+}
291+
292+/* { dg-final { scan-assembler "smlaltb" } } */
293
294=== added file 'gcc/testsuite/gcc.target/arm/smlaltt-1.c'
295--- gcc/testsuite/gcc.target/arm/smlaltt-1.c 1970-01-01 00:00:00 +0000
296+++ gcc/testsuite/gcc.target/arm/smlaltt-1.c 2011-06-02 16:19:50 +0000
297@@ -0,0 +1,13 @@
298+/* { dg-do compile } */
299+/* { dg-options "-O2 -march=armv7-a" } */
300+
301+long long int
302+foo (long long x, int in1, int in2)
303+{
304+ short a = (in1 & 0xffff0000) >> 16;
305+ short b = (in2 & 0xffff0000) >> 16;
306+
307+ return x + b * a;
308+}
309+
310+/* { dg-final { scan-assembler "smlaltt" } } */
311
312=== added file 'gcc/testsuite/gcc.target/arm/smlatb-1.c'
313--- gcc/testsuite/gcc.target/arm/smlatb-1.c 1970-01-01 00:00:00 +0000
314+++ gcc/testsuite/gcc.target/arm/smlatb-1.c 2011-06-02 16:19:50 +0000
315@@ -0,0 +1,13 @@
316+/* { dg-do compile } */
317+/* { dg-options "-O2 -march=armv7-a" } */
318+
319+int
320+foo (int x, int in)
321+{
322+ short a = in & 0xffff;
323+ short b = (in & 0xffff0000) >> 16;
324+
325+ return x + b * a;
326+}
327+
328+/* { dg-final { scan-assembler "smlatb" } } */
329
330=== added file 'gcc/testsuite/gcc.target/arm/smlatt-1.c'
331--- gcc/testsuite/gcc.target/arm/smlatt-1.c 1970-01-01 00:00:00 +0000
332+++ gcc/testsuite/gcc.target/arm/smlatt-1.c 2011-06-02 16:19:50 +0000
333@@ -0,0 +1,13 @@
334+/* { dg-do compile } */
335+/* { dg-options "-O2 -march=armv7-a" } */
336+
337+int
338+foo (int x, int in1, int in2)
339+{
340+ short a = (in1 & 0xffff0000) >> 16;
341+ short b = (in2 & 0xffff0000) >> 16;
342+
343+ return x + b * a;
344+}
345+
346+/* { dg-final { scan-assembler "smlatt" } } */

Subscribers

People subscribed via source and target branches