Merge into 4.6 : maddhidi4-4.6 : Code : Linaro GCC

Status:	Superseded
Proposed branch:	lp:~ams-codesourcery/gcc-linaro/maddhidi4-4.6
Merge into:	lp:gcc-linaro/4.6
Diff against target:	346 lines (+260/-0) (has conflicts) 9 files modified ChangeLog.linaro (+35/-0) gcc/config/arm/arm.md (+63/-0) gcc/doc/md.texi (+17/-0) gcc/simplify-rtx.c (+84/-0) gcc/testsuite/gcc.target/arm/mla-2.c (+9/-0) gcc/testsuite/gcc.target/arm/smlaltb-1.c (+13/-0) gcc/testsuite/gcc.target/arm/smlaltt-1.c (+13/-0) gcc/testsuite/gcc.target/arm/smlatb-1.c (+13/-0) gcc/testsuite/gcc.target/arm/smlatt-1.c (+13/-0) Text conflict in ChangeLog.linaro
To merge this branch:	bzr merge lp:~ams-codesourcery/gcc-linaro/maddhidi4-4.6
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
Linaro Toolchain Builder		2011-05-18	Needs Fixing on 2011-05-19
Review via email: mp+61410@code.launchpad.net

This proposal has been superseded by a proposal from 2011-06-02.

Description of the change

A target-independent patch for improving combine of HImode to DImode mulitply-and-accumulate.

I've posted this patch upstream here:

http://<email address hidden>/msg05794.html

I'm waiting for upstream review, so I've submitted this merge proposal mostly to get the patch tested.

Revision history for this message

Linaro Toolchain Builder (cbuild) wrote on 2011-05-18:

#

cbuild has taken a snapshot of this branch at r106750 and queued it for build.

The snapshot is available at:
http://ex.seabright.co.nz/snapshots/gcc-linaro-4.6+bzr106750~ams-codesourcery~maddhidi4-4.6.tar.xdelta3.xz

and will be built on the following builders:
a9-builder i686 x86_64

You can track the build queue at:
http://ex.seabright.co.nz/helpers/scheduler

cbuild-snapshot: gcc-linaro-4.6+bzr106750~ams-codesourcery~maddhidi4-4.6
cbuild-ancestor: lp:gcc-linaro/4.6+bzr106749
cbuild-state: check

Revision history for this message

Linaro Toolchain Builder (cbuild) wrote on 2011-05-19:

#

cbuild had trouble building this on i686-lucid-cbuild117-scorpius-i686r1.
See the following failure logs:
failed.txt gcc-build-failed.txt

under the build results at:
http://ex.seabright.co.nz/build/gcc-linaro-4.6+bzr106750~ams-codesourcery~maddhidi4-4.6/logs/i686-lucid-cbuild117-scorpius-i686r1

The test suite was not checked as this build has no .sum style test results

cbuild-checked: i686-lucid-cbuild117-scorpius-i686r1

review: Needs Fixing

Revision history for this message

Linaro Toolchain Builder (cbuild) wrote on 2011-05-19:

#

cbuild had trouble building this on x86_64-maverick-cbuild117-crucis-x86_64r1.
See the following failure logs:
failed.txt gcc-build-failed.txt

under the build results at:
http://ex.seabright.co.nz/build/gcc-linaro-4.6+bzr106750~ams-codesourcery~maddhidi4-4.6/logs/x86_64-maverick-cbuild117-crucis-x86_64r1

The test suite was not checked as this build has no .sum style test results

cbuild-checked: x86_64-maverick-cbuild117-crucis-x86_64r1

review: Needs Fixing

Revision history for this message

Michael Hope (michaelh1) wrote on 2011-05-19:

#

The i686 and x86_64 build show similar errors so the fault is probably real.

Revision history for this message

Linaro Toolchain Builder (cbuild) wrote on 2011-05-23:

#

Download full text (3.4 KiB)

cbuild successfully built this on armv7l-maverick-cbuild116-ursa4-cortexa9r1.

The build results are available at:
http://ex.seabright.co.nz/build/gcc-linaro-4.6+bzr106750~ams-codesourcery~maddhidi4-4.6/logs/armv7l-maverick-cbuild116-ursa4-cortexa9r1

The test suite results changed compared to the branch point lp:gcc-linaro/4.6+bzr106749:
-PASS: gcc.dg/range-test-1.c execution test
+FAIL: gcc.dg/range-test-1.c execution test
-PASS: gcc.dg/torture/pr43017.c -O2 execution test
-PASS: gcc.dg/torture/pr43017.c -O2 -flto execution test
-PASS: gcc.dg/torture/pr43017.c -O2 -flto -flto-partition=none execution test
-PASS: gcc.dg/torture/pr43017.c -O2 -flto -flto-partition=none (test for excess errors)
-PASS: gcc.dg/torture/pr43017.c -O2 -flto (test for excess errors)
-PASS: gcc.dg/torture/pr43017.c -O2 (test for excess errors)
-PASS: gcc.dg/torture/pr43017.c -O3 -fomit-frame-pointer execution test
-PASS: gcc.dg/torture/pr43017.c -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions execution test
+UNRESOLVED: gcc.dg/torture/pr43017.c -O2 compilation failed to produce executable
+UNRESOLVED: gcc.dg/torture/pr43017.c -O2 -flto compilation failed to produce executable
+UNRESOLVED: gcc.dg/torture/pr43017.c -O2 -flto -flto-partition=none compilation failed to produce executable
+FAIL: gcc.dg/torture/pr43017.c -O2 -flto -flto-partition=none (internal compiler error)
+FAIL: gcc.dg/torture/pr43017.c -O2 -flto -flto-partition=none (test for excess errors)
+FAIL: gcc.dg/torture/pr43017.c -O2 -flto (internal compiler error)
+FAIL: gcc.dg/torture/pr43017.c -O2 -flto (test for excess errors)
+FAIL: gcc.dg/torture/pr43017.c -O2 (internal compiler error)
+FAIL: gcc.dg/torture/pr43017.c -O2 (test for excess errors)
+FAIL: gcc.dg/torture/pr43017.c -O3 -fomit-frame-pointer execution test
+FAIL: gcc.dg/torture/pr43017.c -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions execution test
-PASS: gcc.dg/torture/pr43017.c -O3 -fomit-frame-pointer -funroll-loops execution test
+FAIL: gcc.dg/torture/pr43017.c -O3 -fomit-frame-pointer -funroll-loops execution test
-PASS: gcc.dg/torture/pr43017.c -O3 -g execution test
+FAIL: gcc.dg/torture/pr43017.c -O3 -g execution test
-PASS: gcc.dg/torture/pr43017.c -Os execution test
-PASS: gcc.dg/torture/pr43017.c -Os (test for excess errors)
+UNRESOLVED: gcc.dg/torture/pr43017.c -Os compilation failed to produce executable
+FAIL: gcc.dg/torture/pr43017.c -Os (internal compiler error)
+FAIL: gcc.dg/torture/pr43017.c -Os (test for excess errors)
-PASS: gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c execution test
+UNRESOLVED: gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c compilation failed to produce executable
+FAIL: gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c (internal compiler error)
-PASS: gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c (test for excess errors)
+FAIL: gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c (test for excess errors)
-PASS: gcc.dg/vect/pr20122.c execution test
-PASS: gcc.dg/vect/pr20122.c scan-tree-dump-times vect "vectorized 1 loops" 3
-PASS: gcc.dg/vect/pr20122.c (test for excess errors)
+UNRESOLVED: gcc.dg/vect/pr20122...

cbuild successfully built this on armv7l-maverick-cbuild116-ursa4-cortexa9r1.

The build results are available at:
 http://ex.seabright.co.nz/build/gcc-linaro-4.6+bzr106750~ams-codesourcery~maddhidi4-4.6/logs/armv7l-maverick-cbuild116-ursa4-cortexa9r1

The test suite results changed compared to the branch point lp:gcc-linaro/4.6+bzr106749:
 -PASS: gcc.dg/range-test-1.c execution test
 +FAIL: gcc.dg/range-test-1.c execution test
 -PASS: gcc.dg/torture/pr43017.c  -O2  execution test
 -PASS: gcc.dg/torture/pr43017.c  -O2 -flto  execution test
 -PASS: gcc.dg/torture/pr43017.c  -O2 -flto -flto-partition=none  execution test
 -PASS: gcc.dg/torture/pr43017.c  -O2 -flto -flto-partition=none  (test for excess errors)
 -PASS: gcc.dg/torture/pr43017.c  -O2 -flto  (test for excess errors)
 -PASS: gcc.dg/torture/pr43017.c  -O2  (test for excess errors)
 -PASS: gcc.dg/torture/pr43017.c  -O3 -fomit-frame-pointer  execution test
 -PASS: gcc.dg/torture/pr43017.c  -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions  execution test
 +UNRESOLVED: gcc.dg/torture/pr43017.c  -O2  compilation failed to produce executable
 +UNRESOLVED: gcc.dg/torture/pr43017.c  -O2 -flto  compilation failed to produce executable
 +UNRESOLVED: gcc.dg/torture/pr43017.c  -O2 -flto -flto-partition=none  compilation failed to produce executable
 +FAIL: gcc.dg/torture/pr43017.c  -O2 -flto -flto-partition=none  (internal compiler error)
 +FAIL: gcc.dg/torture/pr43017.c  -O2 -flto -flto-partition=none  (test for excess errors)
 +FAIL: gcc.dg/torture/pr43017.c  -O2 -flto  (internal compiler error)
 +FAIL: gcc.dg/torture/pr43017.c  -O2 -flto  (test for excess errors)
 +FAIL: gcc.dg/torture/pr43017.c  -O2  (internal compiler error)
 +FAIL: gcc.dg/torture/pr43017.c  -O2  (test for excess errors)
 +FAIL: gcc.dg/torture/pr43017.c  -O3 -fomit-frame-pointer  execution test
 +FAIL: gcc.dg/torture/pr43017.c  -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions  execution test
 -PASS: gcc.dg/torture/pr43017.c  -O3 -fomit-frame-pointer -funroll-loops  execution test
 +FAIL: gcc.dg/torture/pr43017.c  -O3 -fomit-frame-pointer -funroll-loops  execution test
 -PASS: gcc.dg/torture/pr43017.c  -O3 -g  execution test
 +FAIL: gcc.dg/torture/pr43017.c  -O3 -g  execution test
 -PASS: gcc.dg/torture/pr43017.c  -Os  execution test
 -PASS: gcc.dg/torture/pr43017.c  -Os  (test for excess errors)
 +UNRESOLVED: gcc.dg/torture/pr43017.c  -Os  compilation failed to produce executable
 +FAIL: gcc.dg/torture/pr43017.c  -Os  (internal compiler error)
 +FAIL: gcc.dg/torture/pr43017.c  -Os  (test for excess errors)
 -PASS: gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c execution test
 +UNRESOLVED: gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c compilation failed to produce executable
 +FAIL: gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c (internal compiler error)
 -PASS: gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c (test for excess errors)
 +FAIL: gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c (test for excess errors)
 -PASS: gcc.dg/vect/pr20122.c execution test
 -PASS: gcc.dg/vect/pr20122.c scan-tree-dump-times vect "vectorized 1 loops" 3
 -PASS: gcc.dg/vect/pr20122.c (test for excess errors)
 +UNRESOLVED: gcc.dg/vect/pr20122.c compilation failed to produce executable
 ...and 100 more

The full testsuite results are at:
 http://ex.seabright.co.nz/build/gcc-linaro-4.6+bzr106750~ams-codesourcery~maddhidi4-4.6/logs/armv7l-maverick-cbuild116-ursa4-cortexa9r1/gcc-testsuite.txt

cbuild-checked: armv7l-maverick-cbuild116-ursa4-cortexa9r1

Linaro GCC

Merge lp:~ams-codesourcery/gcc-linaro/maddhidi4-4.6 into lp:gcc-linaro/4.6

Commit message

Description of the change

Preview Diff

Subscribers

 === modified file 'ChangeLog.linaro'
 --- ChangeLog.linaro	2011-06-02 12:12:00 +0000
 +++ ChangeLog.linaro	2011-06-02 16:19:50 +0000
@@ -1,3 +1,4 @@
++<<<<<<< TREE
 -06-02  Richard Sandiford  <richard.sandiford@linaro.org>
  	gcc/
@@ -336,6 +337,40 @@
  	* config/arm/arm.h (CANNOT_CHANGE_MODE_CLASS): Restrict FPA_REGS
  	case to VFPv1.
++=======
++2011-06-02  Andrew Stubbs  <ams@codesourcery.com>
++
++	Backport of patch proposed for FSF:
++
++	2011-05-27  Andrew Stubbs  <ams@codesourcery.com>
++
++	gcc/
++	* config/arm/arm.md (*maddhidi4tb, *maddhidi4tt): New define_insns.
++	(*maddhisi4tb, *maddhisi4tt): New define_insns.
++
++	gcc/testsuite/
++	* gcc.target/arm/smlatb-1.c: New file.
++	* gcc.target/arm/smlatt-1.c: New file.
++	* gcc.target/arm/smlaltb-1.c: New file.
++	* gcc.target/arm/smlaltt-1.c: New file.
++
++2011-06-02  Andrew Stubbs  <ams@codesourcery.com>
++
++	Backport of patch proposed for FSF:
++
++	2011-05-26  Bernd Schmidt  <bernds@codesourcery.com>
++		    Andrew Stubbs  <ams@codesourcery.com>
++
++	gcc/
++	* simplify-rtx.c (simplify_unary_operation_1): Canonicalize widening
++	multiplies.
++	* doc/md.texi (Canonicalization of Instructions): Document widening
++	multiply canonicalization.
++
++	gcc/testsuite/
++	* gcc.target/arm/mla-2.c: New test.
++
++>>>>>>> MERGE-SOURCE
 -05-26  Andrew Stubbs  <ams@codesourcery.com>
  	Merge from FSF GCC 4.6 (svn branches/gcc-4_6-branch 174261).
 === modified file 'gcc/config/arm/arm.md'
 --- gcc/config/arm/arm.md	2011-05-13 13:42:39 +0000
 +++ gcc/config/arm/arm.md	2011-06-02 16:19:50 +0000
@@ -1809,6 +1809,36 @@
     (set_attr "predicable" "yes")]
+ )
++;; Note: there is no maddhisi4ibt because this one is canonical form
++(define_insn "*maddhisi4tb"
++  [(set (match_operand:SI 0 "s_register_operand" "=r")
++	(plus:SI (mult:SI (ashiftrt:SI
++			   (match_operand:SI 1 "s_register_operand" "r")
++			   (const_int 16))
++			  (sign_extend:SI
++			   (match_operand:HI 2 "s_register_operand" "r")))
++		 (match_operand:SI 3 "s_register_operand" "r")))]
++  "TARGET_DSP_MULTIPLY"
++  "smlatb%?\\t%0, %1, %2, %3"
++  [(set_attr "insn" "smlaxy")
++   (set_attr "predicable" "yes")]
++)
++
++(define_insn "*maddhisi4tt"
++  [(set (match_operand:SI 0 "s_register_operand" "=r")
++	(plus:SI (mult:SI (ashiftrt:SI
++			   (match_operand:SI 1 "s_register_operand" "r")
++			   (const_int 16))
++			  (ashiftrt:SI
++			   (match_operand:SI 2 "s_register_operand" "r")
++			   (const_int 16)))
++		 (match_operand:SI 3 "s_register_operand" "r")))]
++  "TARGET_DSP_MULTIPLY"
++  "smlatt%?\\t%0, %1, %2, %3"
++  [(set_attr "insn" "smlaxy")
++   (set_attr "predicable" "yes")]
++)
++
  (define_insn "*maddhidi4"
    [(set (match_operand:DI 0 "s_register_operand" "=r")
  	(plus:DI
@@ -1822,6 +1852,39 @@
    [(set_attr "insn" "smlalxy")
     (set_attr "predicable" "yes")])
++;; Note: there is no maddhidi4ibt because this one is canonical form
++(define_insn "*maddhidi4tb"
++  [(set (match_operand:DI 0 "s_register_operand" "=r")
++	(plus:DI
++	  (mult:DI (sign_extend:DI
++		    (ashiftrt:SI
++		     (match_operand:SI 1 "s_register_operand" "r")
++		     (const_int 16)))
++		   (sign_extend:DI
++		    (match_operand:HI 2 "s_register_operand" "r")))
++	  (match_operand:DI 3 "s_register_operand" "0")))]
++  "TARGET_DSP_MULTIPLY"
++  "smlaltb%?\\t%Q0, %R0, %1, %2"
++  [(set_attr "insn" "smlalxy")
++   (set_attr "predicable" "yes")])
++
++(define_insn "*maddhidi4tt"
++  [(set (match_operand:DI 0 "s_register_operand" "=r")
++	(plus:DI
++	  (mult:DI (sign_extend:DI
++		    (ashiftrt:SI
++		     (match_operand:SI 1 "s_register_operand" "r")
++		     (const_int 16)))
++		   (sign_extend:DI
++		    (ashiftrt:SI
++		     (match_operand:SI 2 "s_register_operand" "r")
++		     (const_int 16))))
++	  (match_operand:DI 3 "s_register_operand" "0")))]
++  "TARGET_DSP_MULTIPLY"
++  "smlaltt%?\\t%Q0, %R0, %1, %2"
++  [(set_attr "insn" "smlalxy")
++   (set_attr "predicable" "yes")])
++
  (define_expand "mulsf3"
    [(set (match_operand:SF          0 "s_register_operand" "")
  	(mult:SF (match_operand:SF 1 "s_register_operand" "")
 === modified file 'gcc/doc/md.texi'
 --- gcc/doc/md.texi	2011-05-05 15:43:06 +0000
 +++ gcc/doc/md.texi	2011-06-02 16:19:50 +0000
@@ -5929,6 +5929,23 @@
  will be written using @code{zero_extract} rather than the equivalent
  @code{and} or @code{sign_extract} operations.
++@cindex @code{mult}, canonicalization of
++@item
++@code{(sign_extend:@var{m1} (mult:@var{m2} (sign_extend:@var{m2} @var{x})
++(sign_extend:@var{m2} @var{y})))} is converted to @code{(mult:@var{m1}
++(sign_extend:@var{m1} @var{x}) (sign_extend:@var{m1} @var{y}))}, and likewise
++for @code{zero_extend}.
++
++@item
++@code{(sign_extend:@var{m1} (mult:@var{m2} (ashiftrt:@var{m2}
++@var{x} @var{s}) (sign_extend:@var{m2} @var{y})))} is converted
++to @code{(mult:@var{m1} (sign_extend:@var{m1} (ashiftrt:@var{m2}
++@var{x} @var{s})) (sign_extend:@var{m1} @var{y}))}, and likewise for
++patterns using @code{zero_extend} and @code{lshiftrt}.  If the second
++operand of @code{mult} is also a shift, then that is extended also.
++This transformation is only applied when it can be proven that the
++original operation had sufficient precision to prevent overflow.
++
  @end itemize
  Further canonicalization rules are defined in the function
 === modified file 'gcc/simplify-rtx.c'
 --- gcc/simplify-rtx.c	2011-05-27 14:31:18 +0000
 +++ gcc/simplify-rtx.c	2011-06-02 16:19:50 +0000
@@ -1000,6 +1000,48 @@
  	  && GET_CODE (XEXP (XEXP (op, 0), 1)) == LABEL_REF)
  	return XEXP (op, 0);
++      /* Extending a widening multiplication should be canonicalized to
++	 a wider widening multiplication.  */
++      if (GET_CODE (op) == MULT)
++	{
++	  rtx lhs = XEXP (op, 0);
++	  rtx rhs = XEXP (op, 1);
++	  enum rtx_code lcode = GET_CODE (lhs);
++	  enum rtx_code rcode = GET_CODE (rhs);
++
++	  /* Widening multiplies usually extend both operands, but sometimes
++	     they use a shift to extract a portion of a register.  */
++	  if ((lcode == SIGN_EXTEND
++	       || (lcode == ASHIFTRT && CONST_INT_P (XEXP (lhs, 1))))
++	      && (rcode == SIGN_EXTEND
++		  || (rcode == ASHIFTRT && CONST_INT_P (XEXP (rhs, 1)))))
++	    {
++	      enum machine_mode lmode = GET_MODE (lhs);
++	      enum machine_mode rmode = GET_MODE (rhs);
++	      int bits;
++
++	      if (lcode == ASHIFTRT)
++		/* Number of bits not shifted off the end.  */
++		bits = GET_MODE_PRECISION (lmode) - INTVAL (XEXP (lhs, 1));
++	      else /* lcode == SIGN_EXTEND */
++		/* Size of inner mode.  */
++		bits = GET_MODE_PRECISION (GET_MODE (XEXP (lhs, 0)));
++
++	      if (rcode == ASHIFTRT)
++		bits += GET_MODE_PRECISION (rmode) - INTVAL (XEXP (rhs, 1));
++	      else /* rcode == SIGN_EXTEND */
++		bits += GET_MODE_PRECISION (GET_MODE (XEXP (rhs, 0)));
++
++	      /* We can only widen multiplies if the result is mathematiclly
++		 equivalent.  I.e. if overflow was impossible.  */
++	      if (bits <= GET_MODE_PRECISION (GET_MODE (op)))
++		return simplify_gen_binary
++			 (MULT, mode,
++			  simplify_gen_unary (SIGN_EXTEND, mode, lhs, lmode),
++			  simplify_gen_unary (SIGN_EXTEND, mode, rhs, rmode));
++	    }
++	}
++
        /* Check for a sign extension of a subreg of a promoted
  	 variable, where the promotion is sign-extended, and the
  	 target mode is the same as the variable's promotion.  */
@@ -1071,6 +1113,48 @@
  	  && GET_MODE_SIZE (mode) <= GET_MODE_SIZE (GET_MODE (XEXP (op, 0))))
  	return rtl_hooks.gen_lowpart_no_emit (mode, op);
++      /* Extending a widening multiplication should be canonicalized to
++	 a wider widening multiplication.  */
++      if (GET_CODE (op) == MULT)
++	{
++	  rtx lhs = XEXP (op, 0);
++	  rtx rhs = XEXP (op, 1);
++	  enum rtx_code lcode = GET_CODE (lhs);
++	  enum rtx_code rcode = GET_CODE (rhs);
++
++	  /* Widening multiplies usually extend both operands, but sometimes
++	     they use a shift to extract a portion of a register.  */
++	  if ((lcode == ZERO_EXTEND
++	       || (lcode == LSHIFTRT && CONST_INT_P (XEXP (lhs, 1))))
++	      && (rcode == ZERO_EXTEND
++		  || (rcode == LSHIFTRT && CONST_INT_P (XEXP (rhs, 1)))))
++	    {
++	      enum machine_mode lmode = GET_MODE (lhs);
++	      enum machine_mode rmode = GET_MODE (rhs);
++	      int bits;
++
++	      if (lcode == LSHIFTRT)
++		/* Number of bits not shifted off the end.  */
++		bits = GET_MODE_PRECISION (lmode) - INTVAL (XEXP (lhs, 1));
++	      else /* lcode == ZERO_EXTEND */
++		/* Size of inner mode.  */
++		bits = GET_MODE_PRECISION (GET_MODE (XEXP (lhs, 0)));
++
++	      if (rcode == LSHIFTRT)
++		bits += GET_MODE_PRECISION (rmode) - INTVAL (XEXP (rhs, 1));
++	      else /* rcode == ZERO_EXTEND */
++		bits += GET_MODE_PRECISION (GET_MODE (XEXP (rhs, 0)));
++
++	      /* We can only widen multiplies if the result is mathematiclly
++		 equivalent.  I.e. if overflow was impossible.  */
++	      if (bits <= GET_MODE_PRECISION (GET_MODE (op)))
++		return simplify_gen_binary
++			 (MULT, mode,
++			  simplify_gen_unary (ZERO_EXTEND, mode, lhs, lmode),
++			  simplify_gen_unary (ZERO_EXTEND, mode, rhs, rmode));
++	    }
++	}
++
        /* (zero_extend:M (zero_extend:N <X>)) is (zero_extend:M <X>).  */
        if (GET_CODE (op) == ZERO_EXTEND)
  	return simplify_gen_unary (ZERO_EXTEND, mode, XEXP (op, 0),
 === added file 'gcc/testsuite/gcc.target/arm/mla-2.c'
 --- gcc/testsuite/gcc.target/arm/mla-2.c	1970-01-01 00:00:00 +0000
 +++ gcc/testsuite/gcc.target/arm/mla-2.c	2011-06-02 16:19:50 +0000
@@ -0,0 +1,9 @@
++/* { dg-do compile } */
++/* { dg-options "-O2 -march=armv7-a" } */
++
++long long foolong (long long x, short *a, short *b)
++{
++    return x + *a * *b;
++}
++
++/* { dg-final { scan-assembler "smlalbb" } } */
 === added file 'gcc/testsuite/gcc.target/arm/smlaltb-1.c'
 --- gcc/testsuite/gcc.target/arm/smlaltb-1.c	1970-01-01 00:00:00 +0000
 +++ gcc/testsuite/gcc.target/arm/smlaltb-1.c	2011-06-02 16:19:50 +0000
@@ -0,0 +1,13 @@
++/* { dg-do compile } */
++/* { dg-options "-O2 -march=armv7-a" } */
++
++long long int
++foo (long long x, int in)
++{
++  short a = in & 0xffff;
++  short b = (in & 0xffff0000) >> 16;
++
++  return x + b * a;
++}
++
++/* { dg-final { scan-assembler "smlaltb" } } */
 === added file 'gcc/testsuite/gcc.target/arm/smlaltt-1.c'
 --- gcc/testsuite/gcc.target/arm/smlaltt-1.c	1970-01-01 00:00:00 +0000
 +++ gcc/testsuite/gcc.target/arm/smlaltt-1.c	2011-06-02 16:19:50 +0000
@@ -0,0 +1,13 @@
++/* { dg-do compile } */
++/* { dg-options "-O2 -march=armv7-a" } */
++
++long long int
++foo (long long x, int in1, int in2)
++{
++  short a = (in1 & 0xffff0000) >> 16;
++  short b = (in2 & 0xffff0000) >> 16;
++
++  return x + b * a;
++}
++
++/* { dg-final { scan-assembler "smlaltt" } } */
 === added file 'gcc/testsuite/gcc.target/arm/smlatb-1.c'
 --- gcc/testsuite/gcc.target/arm/smlatb-1.c	1970-01-01 00:00:00 +0000
 +++ gcc/testsuite/gcc.target/arm/smlatb-1.c	2011-06-02 16:19:50 +0000
@@ -0,0 +1,13 @@
++/* { dg-do compile } */
++/* { dg-options "-O2 -march=armv7-a" } */
++
++int
++foo (int x, int in)
++{
++  short a = in & 0xffff;
++  short b = (in & 0xffff0000) >> 16;
++
++  return x + b * a;
++}
++
++/* { dg-final { scan-assembler "smlatb" } } */
 === added file 'gcc/testsuite/gcc.target/arm/smlatt-1.c'
 --- gcc/testsuite/gcc.target/arm/smlatt-1.c	1970-01-01 00:00:00 +0000
 +++ gcc/testsuite/gcc.target/arm/smlatt-1.c	2011-06-02 16:19:50 +0000
@@ -0,0 +1,13 @@
++/* { dg-do compile } */
++/* { dg-options "-O2 -march=armv7-a" } */
++
++int
++foo (int x, int in1, int in2)
++{
++  short a = (in1 & 0xffff0000) >> 16;
++  short b = (in2 & 0xffff0000) >> 16;
++
++  return x + b * a;
++}
++
++/* { dg-final { scan-assembler "smlatt" } } */