Merge lp:~ams-codesourcery/gcc-linaro/lp663939 into lp:gcc-linaro/4.6

Proposed by Andrew Stubbs
Status: Superseded
Proposed branch: lp:~ams-codesourcery/gcc-linaro/lp663939
Merge into: lp:gcc-linaro/4.6
Diff against target: 951 lines (+536/-207) (has conflicts)
9 files modified
ChangeLog.linaro (+56/-0)
gcc/config/arm/arm-protos.h (+1/-0)
gcc/config/arm/arm.c (+301/-197)
gcc/config/arm/arm.md (+13/-9)
gcc/config/arm/constraints.md (+13/-1)
gcc/testsuite/gcc.target/arm/thumb2-replicated-constant1.c (+27/-0)
gcc/testsuite/gcc.target/arm/thumb2-replicated-constant2.c (+75/-0)
gcc/testsuite/gcc.target/arm/thumb2-replicated-constant3.c (+28/-0)
gcc/testsuite/gcc.target/arm/thumb2-replicated-constant4.c (+22/-0)
Text conflict in ChangeLog.linaro
To merge this branch: bzr merge lp:~ams-codesourcery/gcc-linaro/lp663939
Reviewer Review Type Date Requested Status
Linaro Toolchain Builder Pending
Review via email: mp+66916@code.launchpad.net

This proposal supersedes a proposal from 2011-06-02.

This proposal has been superseded by a proposal from 2011-07-11.

Description of the change

These patches improve support for Thumb replicated constants, add support for ADDW and SUBW, and ensure that the most efficient sense is used (inverted, negated, or normal).

This addresses the problems identified in LP:663939.

It has now been approved upstream in this new form.

Update: The testing found a bug. I think I fixed this now, and I've overwritten this branch with the fixed patch.

To post a comment you must log in.
Revision history for this message
Loïc Minier (lool) wrote : Posted in a previous version of this proposal

Looks like we should ping upstream again here

Revision history for this message
Loïc Minier (lool) wrote : Posted in a previous version of this proposal

11:57 < lool> ams_cs: Is
https://code.launchpad.net/~ams-codesourcery/gcc-linaro/lp663939/+merge/45750
              still work in progress? It seems really old now
[...]
11:58 < ams_cs> lool: last activity 12th april
11:58 < ams_cs> lool: I have to do some reworking
11:58 < ams_cs> lool: I've also discussed this patch with Ramana quite a bit

Revision history for this message
Linaro Toolchain Builder (cbuild) wrote : Posted in a previous version of this proposal

cbuild has taken a snapshot of this branch at r106756 and queued it for build.

The snapshot is available at:
 http://ex.seabright.co.nz/snapshots/gcc-linaro-4.6+bzr106756~ams-codesourcery~lp663939.tar.xdelta3.xz

and will be built on the following builders:
 a9-builder i686 x86_64

You can track the build queue at:
 http://ex.seabright.co.nz/helpers/scheduler

cbuild-snapshot: gcc-linaro-4.6+bzr106756~ams-codesourcery~lp663939
cbuild-ancestor: lp:gcc-linaro/4.6+bzr106752
cbuild-state: check

Revision history for this message
Linaro Toolchain Builder (cbuild) wrote : Posted in a previous version of this proposal

cbuild successfully built this on i686-lucid-cbuild123-scorpius-i686r1.

The build results are available at:
 http://ex.seabright.co.nz/build/gcc-linaro-4.6+bzr106756~ams-codesourcery~lp663939/logs/i686-lucid-cbuild123-scorpius-i686r1

The test suite results were unchanged compared to the branch point lp:gcc-linaro/4.6+bzr106752.

The full testsuite results are at:
 http://ex.seabright.co.nz/build/gcc-linaro-4.6+bzr106756~ams-codesourcery~lp663939/logs/i686-lucid-cbuild123-scorpius-i686r1/gcc-testsuite.txt

cbuild-checked: i686-lucid-cbuild123-scorpius-i686r1

Revision history for this message
Linaro Toolchain Builder (cbuild) wrote : Posted in a previous version of this proposal

cbuild had trouble building this on armv7l-maverick-cbuild123-ursa4-cortexa9r1.
See the following failure logs:
 failed.txt gcc-build-failed.txt

under the build results at:
 http://ex.seabright.co.nz/build/gcc-linaro-4.6+bzr106756~ams-codesourcery~lp663939/logs/armv7l-maverick-cbuild123-ursa4-cortexa9r1

The test suite was not checked as this build has no .sum style test results

cbuild-checked: armv7l-maverick-cbuild123-ursa4-cortexa9r1

review: Needs Fixing
Revision history for this message
Linaro Toolchain Builder (cbuild) wrote : Posted in a previous version of this proposal

cbuild successfully built this on x86_64-maverick-cbuild123-crucis-x86_64r1.

The build results are available at:
 http://ex.seabright.co.nz/build/gcc-linaro-4.6+bzr106756~ams-codesourcery~lp663939/logs/x86_64-maverick-cbuild123-crucis-x86_64r1

The test suite results were unchanged compared to the branch point lp:gcc-linaro/4.6+bzr106752.

The full testsuite results are at:
 http://ex.seabright.co.nz/build/gcc-linaro-4.6+bzr106756~ams-codesourcery~lp663939/logs/x86_64-maverick-cbuild123-crucis-x86_64r1/gcc-testsuite.txt

cbuild-checked: x86_64-maverick-cbuild123-crucis-x86_64r1

Revision history for this message
Linaro Toolchain Builder (cbuild) wrote :

cbuild has taken a snapshot of this branch at r106756 and queued it for build.

The snapshot is available at:
 http://ex.seabright.co.nz/snapshots/gcc-linaro-4.6+bzr106756~ams-codesourcery~lp663939.tar.xdelta3.xz

and will be built on the following builders:

You can track the build queue at:
 http://ex.seabright.co.nz/helpers/scheduler

cbuild-snapshot: gcc-linaro-4.6+bzr106756~ams-codesourcery~lp663939
cbuild-ancestor: lp:gcc-linaro/4.6+bzr106752
cbuild-state: check

Revision history for this message
Linaro Toolchain Builder (cbuild) wrote :

cbuild successfully built this on i686-lucid-cbuild123-scorpius-i686r1.

The build results are available at:
 http://ex.seabright.co.nz/build/gcc-linaro-4.6+bzr106756~ams-codesourcery~lp663939/logs/i686-lucid-cbuild123-scorpius-i686r1

The test suite results were unchanged compared to the branch point lp:gcc-linaro/4.6+bzr106752.

The full testsuite results are at:
 http://ex.seabright.co.nz/build/gcc-linaro-4.6+bzr106756~ams-codesourcery~lp663939/logs/i686-lucid-cbuild123-scorpius-i686r1/gcc-testsuite.txt

cbuild-checked: i686-lucid-cbuild123-scorpius-i686r1

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'ChangeLog.linaro'
2--- ChangeLog.linaro 2011-07-04 11:13:51 +0000
3+++ ChangeLog.linaro 2011-07-05 15:18:53 +0000
4@@ -1,3 +1,4 @@
5+<<<<<<< TREE
6 2011-07-01 Andrew Stubbs <ams@codesourcery.com>
7
8 Merge from FSF GCC 4.6.1 (svn branches/gcc-4_6-branch 175677).
9@@ -635,6 +636,61 @@
10 * config/arm/arm.h (CANNOT_CHANGE_MODE_CLASS): Restrict FPA_REGS
11 case to VFPv1.
12
13+=======
14+2011-07-05 Andrew Stubbs <ams@codesourcery.com>
15+
16+ Backport of update to patch proposed for FSF:
17+
18+ 2011-05-09 Andrew Stubbs <ams@codesourcery.com>
19+
20+ gcc/
21+ * config/arm/arm.c (struct four_ints): New type.
22+ (count_insns_for_constant): Delete function.
23+ (find_best_start): Delete function.
24+ (optimal_immediate_sequence): New function.
25+ (optimal_immediate_sequence_1): New function.
26+ (arm_gen_constant): Move constant splitting code to
27+ optimal_immediate_sequence.
28+ Rewrite constant negation/invertion code.
29+
30+ gcc/testsuite/
31+ * gcc.target/arm/thumb2-replicated-constant1.c: New file.
32+ * gcc.target/arm/thumb2-replicated-constant2.c: New file.
33+ * gcc.target/arm/thumb2-replicated-constant3.c: New file.
34+ * gcc.target/arm/thumb2-replicated-constant4.c: New file.
35+
36+2011-06-02 Andrew Stubbs <ams@codesourcery.com>
37+
38+ Backport of patch proposed for FSF:
39+
40+ 2011-06-02 Andrew Stubbs <ams@codesourcery.com>
41+
42+ gcc/
43+ * config/arm/arm-protos.h (const_ok_for_op): Add prototype.
44+ * config/arm/arm.c (const_ok_for_op): Add support for addw/subw.
45+ Remove prototype. Remove static function type.
46+ * config/arm/arm.md (*arm_addsi3): Add addw/subw support.
47+ Add arch attribute.
48+ * config/arm/constraints.md (Pj, PJ): New constraints.
49+
50+2011-06-02 Andrew Stubbs <ams@codesourcery.com>
51+
52+ Backport from FSF:
53+
54+ 2011-04-20 Andrew Stubbs <ams@codesourcery.com>
55+
56+ * config/arm/arm.c (arm_gen_constant): Move movw support ....
57+ (const_ok_for_op): ... to here.
58+
59+2011-06-02 Andrew Stubbs <ams@codesourcery.com>
60+
61+ Backport from FSF:
62+
63+ 2011-04-20 Andrew Stubbs <ams@codesourcery.com>
64+
65+ * config/arm/arm.c (arm_gen_constant): Remove redundant can_invert.
66+
67+>>>>>>> MERGE-SOURCE
68 2011-05-26 Andrew Stubbs <ams@codesourcery.com>
69
70 Merge from FSF GCC 4.6 (svn branches/gcc-4_6-branch 174261).
71
72=== modified file 'gcc/config/arm/arm-protos.h'
73--- gcc/config/arm/arm-protos.h 2011-06-14 16:00:30 +0000
74+++ gcc/config/arm/arm-protos.h 2011-07-05 15:18:53 +0000
75@@ -46,6 +46,7 @@
76 extern bool arm_small_register_classes_for_mode_p (enum machine_mode);
77 extern int arm_hard_regno_mode_ok (unsigned int, enum machine_mode);
78 extern int const_ok_for_arm (HOST_WIDE_INT);
79+extern int const_ok_for_op (HOST_WIDE_INT, enum rtx_code);
80 extern int arm_split_constant (RTX_CODE, enum machine_mode, rtx,
81 HOST_WIDE_INT, rtx, rtx, int);
82 extern RTX_CODE arm_canonicalize_comparison (RTX_CODE, rtx *, rtx *);
83
84=== modified file 'gcc/config/arm/arm.c'
85--- gcc/config/arm/arm.c 2011-06-29 09:13:17 +0000
86+++ gcc/config/arm/arm.c 2011-07-05 15:18:53 +0000
87@@ -63,6 +63,11 @@
88
89 void (*arm_lang_output_object_attributes_hook)(void);
90
91+struct four_ints
92+{
93+ int i[4];
94+};
95+
96 /* Forward function declarations. */
97 static bool arm_needs_doubleword_align (enum machine_mode, const_tree);
98 static int arm_compute_static_chain_stack_bytes (void);
99@@ -81,7 +86,6 @@
100 static bool arm_legitimate_address_p (enum machine_mode, rtx, bool);
101 static int thumb_far_jump_used_p (void);
102 static bool thumb_force_lr_save (void);
103-static int const_ok_for_op (HOST_WIDE_INT, enum rtx_code);
104 static rtx emit_sfm (int, int);
105 static unsigned arm_size_return_regs (void);
106 static bool arm_assemble_integer (rtx, unsigned int, int);
107@@ -129,7 +133,13 @@
108 static int arm_comp_type_attributes (const_tree, const_tree);
109 static void arm_set_default_type_attributes (tree);
110 static int arm_adjust_cost (rtx, rtx, rtx, int);
111-static int count_insns_for_constant (HOST_WIDE_INT, int);
112+static int optimal_immediate_sequence (enum rtx_code code,
113+ unsigned HOST_WIDE_INT val,
114+ struct four_ints *return_sequence);
115+static int optimal_immediate_sequence_1 (enum rtx_code code,
116+ unsigned HOST_WIDE_INT val,
117+ struct four_ints *return_sequence,
118+ int i);
119 static int arm_get_strip_length (int);
120 static bool arm_function_ok_for_sibcall (tree, tree);
121 static enum machine_mode arm_promote_function_mode (const_tree,
122@@ -2522,7 +2532,7 @@
123 }
124
125 /* Return true if I is a valid constant for the operation CODE. */
126-static int
127+int
128 const_ok_for_op (HOST_WIDE_INT i, enum rtx_code code)
129 {
130 if (const_ok_for_arm (i))
131@@ -2530,7 +2540,21 @@
132
133 switch (code)
134 {
135+ case SET:
136+ /* See if we can use movw. */
137+ if (arm_arch_thumb2 && (i & 0xffff0000) == 0)
138+ return 1;
139+ else
140+ return 0;
141+
142 case PLUS:
143+ /* See if we can use addw or subw. */
144+ if (TARGET_THUMB2
145+ && ((i & 0xfffff000) == 0
146+ || ((-i) & 0xfffff000) == 0))
147+ return 1;
148+ /* else fall through. */
149+
150 case COMPARE:
151 case EQ:
152 case NE:
153@@ -2646,68 +2670,41 @@
154 1);
155 }
156
157-/* Return the number of instructions required to synthesize the given
158- constant, if we start emitting them from bit-position I. */
159-static int
160-count_insns_for_constant (HOST_WIDE_INT remainder, int i)
161-{
162- HOST_WIDE_INT temp1;
163- int step_size = TARGET_ARM ? 2 : 1;
164- int num_insns = 0;
165-
166- gcc_assert (TARGET_ARM || i == 0);
167-
168- do
169- {
170- int end;
171-
172- if (i <= 0)
173- i += 32;
174- if (remainder & (((1 << step_size) - 1) << (i - step_size)))
175- {
176- end = i - 8;
177- if (end < 0)
178- end += 32;
179- temp1 = remainder & ((0x0ff << end)
180- | ((i < end) ? (0xff >> (32 - end)) : 0));
181- remainder &= ~temp1;
182- num_insns++;
183- i -= 8 - step_size;
184- }
185- i -= step_size;
186- } while (remainder);
187- return num_insns;
188-}
189-
190-static int
191-find_best_start (unsigned HOST_WIDE_INT remainder)
192+/* Return a sequence of integers, in RETURN_SEQUENCE that fit into
193+ ARM/THUMB2 immediates, and add up to VAL.
194+ Thr function return value gives the number of insns required. */
195+static int
196+optimal_immediate_sequence (enum rtx_code code, unsigned HOST_WIDE_INT val,
197+ struct four_ints *return_sequence)
198 {
199 int best_consecutive_zeros = 0;
200 int i;
201 int best_start = 0;
202+ int insns1, insns2;
203+ struct four_ints tmp_sequence;
204
205 /* If we aren't targetting ARM, the best place to start is always at
206- the bottom. */
207- if (! TARGET_ARM)
208- return 0;
209-
210- for (i = 0; i < 32; i += 2)
211+ the bottom, otherwise look more closely. */
212+ if (TARGET_ARM)
213 {
214- int consecutive_zeros = 0;
215-
216- if (!(remainder & (3 << i)))
217+ for (i = 0; i < 32; i += 2)
218 {
219- while ((i < 32) && !(remainder & (3 << i)))
220- {
221- consecutive_zeros += 2;
222- i += 2;
223- }
224- if (consecutive_zeros > best_consecutive_zeros)
225- {
226- best_consecutive_zeros = consecutive_zeros;
227- best_start = i - consecutive_zeros;
228- }
229- i -= 2;
230+ int consecutive_zeros = 0;
231+
232+ if (!(val & (3 << i)))
233+ {
234+ while ((i < 32) && !(val & (3 << i)))
235+ {
236+ consecutive_zeros += 2;
237+ i += 2;
238+ }
239+ if (consecutive_zeros > best_consecutive_zeros)
240+ {
241+ best_consecutive_zeros = consecutive_zeros;
242+ best_start = i - consecutive_zeros;
243+ }
244+ i -= 2;
245+ }
246 }
247 }
248
249@@ -2734,13 +2731,161 @@
250 the constant starting from `best_start', and also starting from
251 zero (i.e. with bit 31 first to be output). If `best_start' doesn't
252 yield a shorter sequence, we may as well use zero. */
253+ insns1 = optimal_immediate_sequence_1 (code, val, return_sequence, best_start);
254 if (best_start != 0
255- && ((((unsigned HOST_WIDE_INT) 1) << best_start) < remainder)
256- && (count_insns_for_constant (remainder, 0) <=
257- count_insns_for_constant (remainder, best_start)))
258- best_start = 0;
259-
260- return best_start;
261+ && ((((unsigned HOST_WIDE_INT) 1) << best_start) < val))
262+ {
263+ insns2 = optimal_immediate_sequence_1 (code, val, &tmp_sequence, 0);
264+ if (insns2 <= insns1)
265+ {
266+ *return_sequence = tmp_sequence;
267+ insns1 = insns2;
268+ }
269+ }
270+
271+ return insns1;
272+}
273+
274+/* As for optimal_immediate_sequence, but starting at bit-position I. */
275+static int
276+optimal_immediate_sequence_1 (enum rtx_code code, unsigned HOST_WIDE_INT val,
277+ struct four_ints *return_sequence, int i)
278+{
279+ int remainder = val & 0xffffffff;
280+ int insns = 0;
281+
282+ /* Try and find a way of doing the job in either two or three
283+ instructions.
284+
285+ In ARM mode we can use 8-bit constants, rotated to any 2-bit aligned
286+ location. We start at position I. This may be the MSB, or
287+ optimial_immediate_sequence may have positioned it at the largest block
288+ of zeros that are aligned on a 2-bit boundary. We then fill up the temps,
289+ wrapping around to the top of the word when we drop off the bottom.
290+ In the worst case this code should produce no more than four insns.
291+
292+ In Thumb2 mode, we can use 32/16-bit replicated constants, and 8-bit
293+ constants, shifted to any arbitrary location. We should always start
294+ at the MSB. */
295+ do
296+ {
297+ int end;
298+ int b1, b2, b3, b4;
299+ unsigned HOST_WIDE_INT result;
300+ int loc;
301+
302+ gcc_assert (insns < 4);
303+
304+ if (i <= 0)
305+ i += 32;
306+
307+ /* First, find the next normal 12/8-bit shifted/rotated immediate. */
308+ if (remainder & ((TARGET_ARM ? (3 << (i - 2)) : (1 << (i - 1)))))
309+ {
310+ loc = i;
311+ if (i <= 12 && TARGET_THUMB2 && code == PLUS)
312+ /* We can use addw/subw for the last 12 bits. */
313+ result = remainder;
314+ else
315+ {
316+ /* Use an 8-bit shifted/rotated immediate. */
317+ end = i - 8;
318+ if (end < 0)
319+ end += 32;
320+ result = remainder & ((0x0ff << end)
321+ | ((i < end) ? (0xff >> (32 - end))
322+ : 0));
323+ i -= 8;
324+ }
325+ }
326+ else
327+ {
328+ /* Arm allows rotates by a multiple of two. Thumb-2 allows
329+ arbitrary shifts. */
330+ i -= TARGET_ARM ? 2 : 1;
331+ continue;
332+ }
333+
334+ /* Next, see if we can do a better job with a thumb2 replicated
335+ constant.
336+
337+ We do it this way around to catch the cases like 0x01F001E0 where
338+ two 8-bit immediates would work, but a replicated constant would
339+ make it worse.
340+
341+ TODO: 16-bit constants that don't clear all the bits, but still win.
342+ TODO: Arithmetic splitting for set/add/sub, rather than bitwise. */
343+ if (TARGET_THUMB2)
344+ {
345+ b1 = (remainder & 0xff000000) >> 24;
346+ b2 = (remainder & 0x00ff0000) >> 16;
347+ b3 = (remainder & 0x0000ff00) >> 8;
348+ b4 = remainder & 0xff;
349+
350+ if (loc > 24)
351+ {
352+ /* The 8-bit immediate already found clears b1 (and maybe b2),
353+ but must leave b3 and b4 alone. */
354+
355+ /* First try to find a 32-bit replicated constant that clears
356+ almost everything. We can assume that we can't do it in one,
357+ or else we wouldn't be here. */
358+ unsigned int tmp = b1 & b2 & b3 & b4;
359+ unsigned int tmp2 = tmp + (tmp << 8) + (tmp << 16)
360+ + (tmp << 24);
361+ unsigned int matching_bytes = (tmp == b1) + (tmp == b2)
362+ + (tmp == b3) + (tmp == b4);
363+ if (tmp
364+ && (matching_bytes >= 3
365+ || (matching_bytes == 2
366+ && const_ok_for_op (remainder & ~tmp2, code))))
367+ {
368+ /* At least 3 of the bytes match, and the fourth has at
369+ least as many bits set, or two of the bytes match
370+ and it will only require one more insn to finish. */
371+ result = tmp2;
372+ i = tmp != b1 ? 32
373+ : tmp != b2 ? 24
374+ : tmp != b3 ? 16
375+ : 8;
376+ }
377+
378+ /* Second, try to find a 16-bit replicated constant that can
379+ leave three of the bytes clear. If b2 or b4 is already
380+ zero, then we can. If the 8-bit from above would not
381+ clear b2 anyway, then we still win. */
382+ else if (b1 == b3 && (!b2 || !b4
383+ || (remainder & 0x00ff0000 & ~result)))
384+ {
385+ result = remainder & 0xff00ff00;
386+ i = 24;
387+ }
388+ }
389+ else if (loc > 16)
390+ {
391+ /* The 8-bit immediate already found clears b2 (and maybe b3)
392+ and we don't get here unless b1 is alredy clear, but it will
393+ leave b4 unchanged. */
394+
395+ /* If we can clear b2 and b4 at once, then we win, since the
396+ 8-bits couldn't possibly reach that far. */
397+ if (b2 == b4)
398+ {
399+ result = remainder & 0x00ff00ff;
400+ i = 16;
401+ }
402+ }
403+ }
404+
405+ return_sequence->i[insns++] = result;
406+ remainder &= ~result;
407+
408+ if (code == SET || code == MINUS)
409+ code = PLUS;
410+ }
411+ while (remainder);
412+
413+ return insns;
414 }
415
416 /* Emit an instruction with the indicated PATTERN. If COND is
417@@ -2757,7 +2902,6 @@
418
419 /* As above, but extra parameter GENERATE which, if clear, suppresses
420 RTL generation. */
421-/* ??? This needs more work for thumb2. */
422
423 static int
424 arm_gen_constant (enum rtx_code code, enum machine_mode mode, rtx cond,
425@@ -2769,15 +2913,15 @@
426 int final_invert = 0;
427 int can_negate_initial = 0;
428 int i;
429- int num_bits_set = 0;
430 int set_sign_bit_copies = 0;
431 int clear_sign_bit_copies = 0;
432 int clear_zero_bit_copies = 0;
433 int set_zero_bit_copies = 0;
434- int insns = 0;
435+ int insns = 0, neg_insns, inv_insns;
436 unsigned HOST_WIDE_INT temp1, temp2;
437 unsigned HOST_WIDE_INT remainder = val & 0xffffffff;
438- int step_size = TARGET_ARM ? 2 : 1;
439+ struct four_ints *immediates;
440+ struct four_ints pos_immediates, neg_immediates, inv_immediates;
441
442 /* Find out which operations are safe for a given CODE. Also do a quick
443 check for degenerate cases; these can occur when DImode operations
444@@ -2814,9 +2958,6 @@
445 gen_rtx_SET (VOIDmode, target, source));
446 return 1;
447 }
448-
449- if (TARGET_THUMB2)
450- can_invert = 1;
451 break;
452
453 case AND:
454@@ -2880,7 +3021,6 @@
455 source)));
456 return 1;
457 }
458- can_negate = 1;
459
460 break;
461
462@@ -2889,9 +3029,7 @@
463 }
464
465 /* If we can do it in one insn get out quickly. */
466- if (const_ok_for_arm (val)
467- || (can_negate_initial && const_ok_for_arm (-val))
468- || (can_invert && const_ok_for_arm (~val)))
469+ if (const_ok_for_op (val, code))
470 {
471 if (generate)
472 emit_constant_insn (cond,
473@@ -2944,15 +3082,6 @@
474 switch (code)
475 {
476 case SET:
477- /* See if we can use movw. */
478- if (arm_arch_thumb2 && (remainder & 0xffff0000) == 0)
479- {
480- if (generate)
481- emit_constant_insn (cond, gen_rtx_SET (VOIDmode, target,
482- GEN_INT (val)));
483- return 1;
484- }
485-
486 /* See if we can do this by sign_extending a constant that is known
487 to be negative. This is a good, way of doing it, since the shift
488 may well merge into a subsequent insn. */
489@@ -3303,121 +3432,96 @@
490 break;
491 }
492
493- for (i = 0; i < 32; i++)
494- if (remainder & (1 << i))
495- num_bits_set++;
496-
497- if ((code == AND)
498- || (code != IOR && can_invert && num_bits_set > 16))
499- remainder ^= 0xffffffff;
500- else if (code == PLUS && num_bits_set > 16)
501- remainder = (-remainder) & 0xffffffff;
502-
503- /* For XOR, if more than half the bits are set and there's a sequence
504- of more than 8 consecutive ones in the pattern then we can XOR by the
505- inverted constant and then invert the final result; this may save an
506- instruction and might also lead to the final mvn being merged with
507- some other operation. */
508- else if (code == XOR && num_bits_set > 16
509- && (count_insns_for_constant (remainder ^ 0xffffffff,
510- find_best_start
511- (remainder ^ 0xffffffff))
512- < count_insns_for_constant (remainder,
513- find_best_start (remainder))))
514- {
515- remainder ^= 0xffffffff;
516- final_invert = 1;
517- }
518- else
519- {
520- can_invert = 0;
521- can_negate = 0;
522- }
523-
524- /* Now try and find a way of doing the job in either two or three
525- instructions.
526- We start by looking for the largest block of zeros that are aligned on
527- a 2-bit boundary, we then fill up the temps, wrapping around to the
528- top of the word when we drop off the bottom.
529- In the worst case this code should produce no more than four insns.
530- Thumb-2 constants are shifted, not rotated, so the MSB is always the
531- best place to start. */
532-
533- /* ??? Use thumb2 replicated constants when the high and low halfwords are
534- the same. */
535- {
536- /* Now start emitting the insns. */
537- i = find_best_start (remainder);
538- do
539- {
540- int end;
541-
542- if (i <= 0)
543- i += 32;
544- if (remainder & (3 << (i - 2)))
545- {
546- end = i - 8;
547- if (end < 0)
548- end += 32;
549- temp1 = remainder & ((0x0ff << end)
550- | ((i < end) ? (0xff >> (32 - end)) : 0));
551- remainder &= ~temp1;
552-
553- if (generate)
554- {
555- rtx new_src, temp1_rtx;
556-
557- if (code == SET || code == MINUS)
558- {
559- new_src = (subtargets ? gen_reg_rtx (mode) : target);
560- if (can_invert && code != MINUS)
561- temp1 = ~temp1;
562- }
563- else
564- {
565- if ((final_invert || remainder) && subtargets)
566- new_src = gen_reg_rtx (mode);
567- else
568- new_src = target;
569- if (can_invert)
570- temp1 = ~temp1;
571- else if (can_negate)
572- temp1 = -temp1;
573- }
574-
575- temp1 = trunc_int_for_mode (temp1, mode);
576- temp1_rtx = GEN_INT (temp1);
577-
578- if (code == SET)
579- ;
580- else if (code == MINUS)
581- temp1_rtx = gen_rtx_MINUS (mode, temp1_rtx, source);
582- else
583- temp1_rtx = gen_rtx_fmt_ee (code, mode, source, temp1_rtx);
584-
585- emit_constant_insn (cond,
586- gen_rtx_SET (VOIDmode, new_src,
587- temp1_rtx));
588- source = new_src;
589- }
590-
591- if (code == SET)
592- {
593- can_invert = 0;
594- code = PLUS;
595- }
596- else if (code == MINUS)
597+ /* Calculate what the instruction sequences would be if we generated it
598+ normally, negated, or inverted. */
599+ if (code == AND)
600+ /* AND cannot be split into multiple insns, so invert and use BIC. */
601+ insns = 99;
602+ else
603+ insns = optimal_immediate_sequence (code, remainder, &pos_immediates);
604+
605+ if (can_negate)
606+ neg_insns = optimal_immediate_sequence (code, (-remainder) & 0xffffffff,
607+ &neg_immediates);
608+ else
609+ neg_insns = 99;
610+
611+ if (can_invert)
612+ inv_insns = optimal_immediate_sequence (code, remainder ^ 0xffffffff,
613+ &inv_immediates);
614+ else
615+ inv_insns = 99;
616+
617+ immediates = &pos_immediates;
618+
619+ /* Is the negated immediate sequence more efficient? */
620+ if (neg_insns < insns && neg_insns <= inv_insns)
621+ {
622+ insns = neg_insns;
623+ immediates = &neg_immediates;
624+ }
625+ else
626+ can_negate = 0;
627+
628+ /* Is the inverted immediate sequence more efficient?
629+ We must allow for an extra NOT instruction for XOR operations, although
630+ there is some chance that the final 'mvn' will get optimized later. */
631+ if (inv_insns < insns && (code != XOR || (inv_insns + 1) < insns))
632+ {
633+ insns = inv_insns;
634+ immediates = &inv_immediates;
635+
636+ if (code == XOR)
637+ final_invert = 1;
638+ }
639+ else
640+ can_invert = 0;
641+
642+ /* Now output the chosen sequence as instructions. */
643+ if (generate)
644+ {
645+ for (i = 0; i < insns; i++)
646+ {
647+ rtx new_src, temp1_rtx;
648+
649+ temp1 = immediates->i[i];
650+
651+ if (code == SET || code == MINUS)
652+ new_src = (subtargets ? gen_reg_rtx (mode) : target);
653+ else if ((final_invert || i < (insns - 1)) && subtargets)
654+ new_src = gen_reg_rtx (mode);
655+ else
656+ new_src = target;
657+
658+ if (can_invert)
659+ temp1 = ~temp1;
660+ else if (can_negate)
661+ temp1 = -temp1;
662+
663+ temp1 = trunc_int_for_mode (temp1, mode);
664+ temp1_rtx = GEN_INT (temp1);
665+
666+ if (code == SET)
667+ ;
668+ else if (code == MINUS)
669+ temp1_rtx = gen_rtx_MINUS (mode, temp1_rtx, source);
670+ else
671+ temp1_rtx = gen_rtx_fmt_ee (code, mode, source, temp1_rtx);
672+
673+ emit_constant_insn (cond,
674+ gen_rtx_SET (VOIDmode, new_src,
675+ temp1_rtx));
676+ source = new_src;
677+
678+ if (code == SET)
679+ {
680+ can_invert = 0;
681 code = PLUS;
682-
683- insns++;
684- i -= 8 - step_size;
685- }
686- /* Arm allows rotates by a multiple of two. Thumb-2 allows arbitrary
687- shifts. */
688- i -= step_size;
689- }
690- while (remainder);
691- }
692+ }
693+ else if (code == MINUS)
694+ code = PLUS;
695+ }
696+ }
697
698 if (final_invert)
699 {
700
701=== modified file 'gcc/config/arm/arm.md'
702--- gcc/config/arm/arm.md 2011-06-28 12:02:27 +0000
703+++ gcc/config/arm/arm.md 2011-07-05 15:18:53 +0000
704@@ -701,21 +701,24 @@
705 ;; (plus (reg rN) (reg sp)) into (reg rN). In this case reload will
706 ;; put the duplicated register first, and not try the commutative version.
707 (define_insn_and_split "*arm_addsi3"
708- [(set (match_operand:SI 0 "s_register_operand" "=r, k,r,r, k,r")
709- (plus:SI (match_operand:SI 1 "s_register_operand" "%rk,k,r,rk,k,rk")
710- (match_operand:SI 2 "reg_or_int_operand" "rI,rI,k,L, L,?n")))]
711+ [(set (match_operand:SI 0 "s_register_operand" "=r, k,r,r, k, r, k,r, k, r")
712+ (plus:SI (match_operand:SI 1 "s_register_operand" "%rk,k,r,rk,k, rk,k,rk,k, rk")
713+ (match_operand:SI 2 "reg_or_int_operand" "rI,rI,k,Pj,Pj,L, L,PJ,PJ,?n")))]
714 "TARGET_32BIT"
715 "@
716 add%?\\t%0, %1, %2
717 add%?\\t%0, %1, %2
718 add%?\\t%0, %2, %1
719- sub%?\\t%0, %1, #%n2
720- sub%?\\t%0, %1, #%n2
721+ addw%?\\t%0, %1, %2
722+ addw%?\\t%0, %1, %2
723+ sub%?\\t%0, %1, #%n2
724+ sub%?\\t%0, %1, #%n2
725+ subw%?\\t%0, %1, #%n2
726+ subw%?\\t%0, %1, #%n2
727 #"
728 "TARGET_32BIT
729 && GET_CODE (operands[2]) == CONST_INT
730- && !(const_ok_for_arm (INTVAL (operands[2]))
731- || const_ok_for_arm (-INTVAL (operands[2])))
732+ && !const_ok_for_op (INTVAL (operands[2]), PLUS)
733 && (reload_completed || !arm_eliminable_register (operands[1]))"
734 [(clobber (const_int 0))]
735 "
736@@ -724,8 +727,9 @@
737 operands[1], 0);
738 DONE;
739 "
740- [(set_attr "length" "4,4,4,4,4,16")
741- (set_attr "predicable" "yes")]
742+ [(set_attr "length" "4,4,4,4,4,4,4,4,4,16")
743+ (set_attr "predicable" "yes")
744+ (set_attr "arch" "*,*,*,t2,t2,*,*,t2,t2,*")]
745 )
746
747 (define_insn_and_split "*thumb1_addsi3"
748
749=== modified file 'gcc/config/arm/constraints.md'
750--- gcc/config/arm/constraints.md 2011-01-03 20:52:22 +0000
751+++ gcc/config/arm/constraints.md 2011-07-05 15:18:53 +0000
752@@ -31,7 +31,7 @@
753 ;; The following multi-letter normal constraints have been used:
754 ;; in ARM/Thumb-2 state: Da, Db, Dc, Dn, Dl, DL, Dv, Dy, Di, Dz
755 ;; in Thumb-1 state: Pa, Pb, Pc, Pd
756-;; in Thumb-2 state: Ps, Pt, Pu, Pv, Pw, Px
757+;; in Thumb-2 state: Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px
758
759 ;; The following memory constraints have been used:
760 ;; in ARM/Thumb-2 state: Q, Ut, Uv, Uy, Un, Um, Us
761@@ -74,6 +74,18 @@
762 (and (match_code "const_int")
763 (match_test "(ival & 0xffff0000) == 0")))))
764
765+(define_constraint "Pj"
766+ "@internal A 12-bit constant suitable for an ADDW or SUBW instruction. (Thumb-2)"
767+ (and (match_code "const_int")
768+ (and (match_test "TARGET_THUMB2")
769+ (match_test "(ival & 0xfffff000) == 0"))))
770+
771+(define_constraint "PJ"
772+ "@internal A constant that satisfies the Pj constrant if negated."
773+ (and (match_code "const_int")
774+ (and (match_test "TARGET_THUMB2")
775+ (match_test "((-ival) & 0xfffff000) == 0"))))
776+
777 (define_register_constraint "k" "STACK_REG"
778 "@internal The stack register.")
779
780
781=== added file 'gcc/testsuite/gcc.target/arm/thumb2-replicated-constant1.c'
782--- gcc/testsuite/gcc.target/arm/thumb2-replicated-constant1.c 1970-01-01 00:00:00 +0000
783+++ gcc/testsuite/gcc.target/arm/thumb2-replicated-constant1.c 2011-07-05 15:18:53 +0000
784@@ -0,0 +1,27 @@
785+/* Ensure simple replicated constant immediates work. */
786+/* { dg-options "-mthumb -O2" } */
787+/* { dg-require-effective-target arm_thumb2_ok } */
788+
789+int
790+foo1 (int a)
791+{
792+ return a + 0xfefefefe;
793+}
794+
795+/* { dg-final { scan-assembler "add.*#-16843010" } } */
796+
797+int
798+foo2 (int a)
799+{
800+ return a - 0xab00ab00;
801+}
802+
803+/* { dg-final { scan-assembler "sub.*#-1426019584" } } */
804+
805+int
806+foo3 (int a)
807+{
808+ return a & 0x00cd00cd;
809+}
810+
811+/* { dg-final { scan-assembler "and.*#13435085" } } */
812
813=== added file 'gcc/testsuite/gcc.target/arm/thumb2-replicated-constant2.c'
814--- gcc/testsuite/gcc.target/arm/thumb2-replicated-constant2.c 1970-01-01 00:00:00 +0000
815+++ gcc/testsuite/gcc.target/arm/thumb2-replicated-constant2.c 2011-07-05 15:18:53 +0000
816@@ -0,0 +1,75 @@
817+/* Ensure split constants can use replicated patterns. */
818+/* { dg-options "-mthumb -O2" } */
819+/* { dg-require-effective-target arm_thumb2_ok } */
820+
821+int
822+foo1 (int a)
823+{
824+ return a + 0xfe00fe01;
825+}
826+
827+/* { dg-final { scan-assembler "add.*#-33489408" } } */
828+/* { dg-final { scan-assembler "add.*#1" } } */
829+
830+int
831+foo2 (int a)
832+{
833+ return a + 0xdd01dd00;
834+}
835+
836+/* { dg-final { scan-assembler "add.*#-587145984" } } */
837+/* { dg-final { scan-assembler "add.*#65536" } } */
838+
839+int
840+foo3 (int a)
841+{
842+ return a + 0x00443344;
843+}
844+
845+/* { dg-final { scan-assembler "add.*#4456516" } } */
846+/* { dg-final { scan-assembler "add.*#13056" } } */
847+
848+int
849+foo4 (int a)
850+{
851+ return a + 0x77330033;
852+}
853+
854+/* { dg-final { scan-assembler "add.*#1996488704" } } */
855+/* { dg-final { scan-assembler "add.*#3342387" } } */
856+
857+int
858+foo5 (int a)
859+{
860+ return a + 0x11221122;
861+}
862+
863+/* { dg-final { scan-assembler "add.*#285217024" } } */
864+/* { dg-final { scan-assembler "add.*#2228258" } } */
865+
866+int
867+foo6 (int a)
868+{
869+ return a + 0x66666677;
870+}
871+
872+/* { dg-final { scan-assembler "add.*#1717986918" } } */
873+/* { dg-final { scan-assembler "add.*#17" } } */
874+
875+int
876+foo7 (int a)
877+{
878+ return a + 0x99888888;
879+}
880+
881+/* { dg-final { scan-assembler "add.*#-2004318072" } } */
882+/* { dg-final { scan-assembler "add.*#285212672" } } */
883+
884+int
885+foo8 (int a)
886+{
887+ return a + 0xdddddfff;
888+}
889+
890+/* { dg-final { scan-assembler "add.*#-572662307" } } */
891+/* { dg-final { scan-assembler "addw.*#546" } } */
892
893=== added file 'gcc/testsuite/gcc.target/arm/thumb2-replicated-constant3.c'
894--- gcc/testsuite/gcc.target/arm/thumb2-replicated-constant3.c 1970-01-01 00:00:00 +0000
895+++ gcc/testsuite/gcc.target/arm/thumb2-replicated-constant3.c 2011-07-05 15:18:53 +0000
896@@ -0,0 +1,28 @@
897+/* Ensure negated/inverted replicated constant immediates work. */
898+/* { dg-options "-mthumb -O2" } */
899+/* { dg-require-effective-target arm_thumb2_ok } */
900+
901+int
902+foo1 (int a)
903+{
904+ return a | 0xffffff00;
905+}
906+
907+/* { dg-final { scan-assembler "orn.*#255" } } */
908+
909+int
910+foo2 (int a)
911+{
912+ return a & 0xffeeffee;
913+}
914+
915+/* { dg-final { scan-assembler "bic.*#1114129" } } */
916+
917+int
918+foo3 (int a)
919+{
920+ return a & 0xaaaaaa00;
921+}
922+
923+/* { dg-final { scan-assembler "and.*#-1431655766" } } */
924+/* { dg-final { scan-assembler "bic.*#170" } } */
925
926=== added file 'gcc/testsuite/gcc.target/arm/thumb2-replicated-constant4.c'
927--- gcc/testsuite/gcc.target/arm/thumb2-replicated-constant4.c 1970-01-01 00:00:00 +0000
928+++ gcc/testsuite/gcc.target/arm/thumb2-replicated-constant4.c 2011-07-05 15:18:53 +0000
929@@ -0,0 +1,22 @@
930+/* Ensure replicated constants don't make things worse. */
931+/* { dg-options "-mthumb -O2" } */
932+/* { dg-require-effective-target arm_thumb2_ok } */
933+
934+int
935+foo1 (int a)
936+{
937+ /* It might be tempting to use 0x01000100, but it wouldn't help. */
938+ return a + 0x01f001e0;
939+}
940+
941+/* { dg-final { scan-assembler "add.*#32505856" } } */
942+/* { dg-final { scan-assembler "add.*#480" } } */
943+
944+int
945+foo2 (int a)
946+{
947+ return a + 0x0f100e10;
948+}
949+
950+/* { dg-final { scan-assembler "add.*#252706816" } } */
951+/* { dg-final { scan-assembler "add.*#3600" } } */

Subscribers

People subscribed via source and target branches