Commit Graph

213783 Commits

Author SHA1 Message Date
Richard Biener
77bd23a3e2 Fall back to single-lane SLP before falling back to no SLP
The following changes the fallback to disable SLP when any of the
discovered SLP instances failed to pass vectorization checking into
a fallback that emulates what no SLP would do with SLP - force
single-lane discovery for all instances.

The patch does not remove the final fallback to disable SLP but it
reduces the fallout from failing vectorization when any non-SLP
stmt survives analysis.

	* tree-vectorizer.h (vect_analyze_slp): Add force_single_lane
	parameter.
	* tree-vect-slp.cc (vect_analyze_slp_instance): Remove
	defaulting of force_single_lane.
	(vect_build_slp_instance): Likewise.  Pass down appropriate
	force_single_lane.
	(vect_analyze_slp): Add force_sigle_lane parameter and pass
	it down appropriately.
	(vect_slp_analyze_bb_1): Always do multi-lane SLP.
	* tree-vect-loop.cc (vect_analyze_loop_2): Track two SLP
	modes and adjust accordingly.
	(vect_analyze_loop_1): Save the SLP mode when unrolling.

	* gcc.dg/vect/vect-outer-slp-1.c: Adjust.
2024-09-19 16:26:28 +02:00
Jason Merrill
d3a7302ec5 libstdc++: add #pragma diagnostic
The use of #pragma GCC system_header in libstdc++ has led to bugs going
undetected for a while due to the silencing of compiler warnings that would
have revealed them promptly, and also interferes with warnings about
problematic template instantiations induced by user code.

But removing it, or even compiling with -Wsystem-header, is also problematic
due to warnings about deliberate uses of extensions.

So this patch adds #pragma GCC diagnostic as needed to suppress these
warnings.

The change to acinclude.m4 changes -Wabi to warn only in comparison to ABI
19, to avoid lots of warnings that we now mangle concept requirements, which
are in any case still experimental.  I checked for any other changes against
ABI v15, and found only the <format> lambda mangling, which we can ignore.

This also enables -Wsystem-headers while building the library, so we see any
warnings not silenced by these #pragmas.

libstdc++-v3/ChangeLog:

	* include/bits/algorithmfwd.h:
	* include/bits/allocator.h:
	* include/bits/codecvt.h:
	* include/bits/concept_check.h:
	* include/bits/cpp_type_traits.h:
	* include/bits/hashtable.h:
	* include/bits/iterator_concepts.h:
	* include/bits/ostream_insert.h:
	* include/bits/ranges_base.h:
	* include/bits/regex_automaton.h:
	* include/bits/std_abs.h:
	* include/bits/stl_algo.h:
	* include/c_compatibility/fenv.h:
	* include/c_compatibility/inttypes.h:
	* include/c_compatibility/stdint.h:
	* include/ext/concurrence.h:
	* include/ext/type_traits.h:
	* testsuite/ext/type_traits/add_unsigned_floating_neg.cc:
	* testsuite/ext/type_traits/add_unsigned_integer_neg.cc:
	* testsuite/ext/type_traits/remove_unsigned_floating_neg.cc:
	* testsuite/ext/type_traits/remove_unsigned_integer_neg.cc:
	* include/bits/basic_ios.tcc:
	* include/bits/basic_string.tcc:
	* include/bits/fstream.tcc:
	* include/bits/istream.tcc:
	* include/bits/locale_classes.tcc:
	* include/bits/locale_facets.tcc:
	* include/bits/ostream.tcc:
	* include/bits/regex_compiler.tcc:
	* include/bits/sstream.tcc:
	* include/bits/streambuf.tcc:
	* configure: Regenerate.
	* include/bits/c++config:
	* include/c/cassert:
	* include/c/cctype:
	* include/c/cerrno:
	* include/c/cfloat:
	* include/c/climits:
	* include/c/clocale:
	* include/c/cmath:
	* include/c/csetjmp:
	* include/c/csignal:
	* include/c/cstdarg:
	* include/c/cstddef:
	* include/c/cstdio:
	* include/c/cstdlib:
	* include/c/cstring:
	* include/c/ctime:
	* include/c/cwchar:
	* include/c/cwctype:
	* include/c_global/climits:
	* include/c_global/cmath:
	* include/c_global/cstddef:
	* include/c_global/cstdlib:
	* include/decimal/decimal:
	* include/ext/rope:
	* include/std/any:
	* include/std/charconv:
	* include/std/complex:
	* include/std/coroutine:
	* include/std/format:
	* include/std/iomanip:
	* include/std/limits:
	* include/std/numbers:
	* include/tr1/functional:
	* include/tr1/tuple:
	* include/tr1/type_traits:
	* libsupc++/compare:
	* libsupc++/new: Add #pragma GCC diagnostic to suppress
	undesired warnings.
	* acinclude.m4: Change -Wabi version from 2 to 19.

gcc/ChangeLog:

	* ginclude/stdint-wrap.h: Add #pragma GCC diagnostic to suppress
	undesired warnings.
	* gsyslimits.h: Likewise.
2024-09-19 10:23:16 -04:00
Richard Biener
605d05b948 Always dump generated distance vectors
There's special-casing for equal access functions which bypasses
printing the distance vectors.  The following makes sure we print
them always which helps debugging.

	* tree-data-ref.cc (build_classic_dist_vector): Move
	distance vector dumping to single caller ...
	(subscript_dependence_tester): ... here, dumping always
	when we succeed computing it.
2024-09-19 13:34:24 +02:00
Richard Biener
5e3a4a0178 tree-optimization/116573 - .SELECT_VL for SLP
The following restores the use of .SELECT_VL for testcases where it
is safe to use even when using SLP.  I've for now restricted it
to single-lane SLP plus optimistically allow store-lane nodes
and assume single-lane roots are not widened but at most to
load-lane who should be fine.

	PR tree-optimization/116573
	* tree-vect-loop.cc (vect_analyze_loop_2): Allow .SELECV_VL
	for SLP but disable it when there's multi-lane instances.
	* tree-vect-stmts.cc (vectorizable_store): Only compute the
	ptr increment when generating code.
	(vectorizable_load): Likewise.
2024-09-19 13:28:16 +02:00
Andre Vehreschild
de915fbe3c Fortran: Break recursion building recursive types. [PR106606]
Build a derived type component's type only, when it is not already being
built and the component uses pointer semantics.

gcc/fortran/ChangeLog:

	PR fortran/106606

	* trans-types.cc (gfc_get_derived_type): Only build non-pointer
	derived types as component's types when they are not yet built.

gcc/testsuite/ChangeLog:

	* gfortran.dg/recursive_alloc_comp_5.f90: New test.
2024-09-19 12:17:08 +02:00
Pan Li
427f824258 RISC-V: Fix vector SAT_ADD dump check due to middle-end change
This patch would like fix the dump check times of vector SAT_ADD.  The
middle-end change makes the match times from 2 to 4 times.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-21.c: Adjust
	the dump check times from 2 to 4.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-22.c: Ditto.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-23.c: Ditto.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-24.c: Ditto.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-25.c: Ditto.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-26.c: Ditto.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-27.c: Ditto.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-28.c: Ditto.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-29.c: Ditto.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-30.c: Ditto.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-31.c: Ditto.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-32.c: Ditto.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-5.c: Ditto.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-6.c: Ditto.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-7.c: Ditto.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-8.c: Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>
2024-09-19 18:13:31 +08:00
Pan Li
e917a251d8 Match: Support form 3 for scalar signed integer .SAT_ADD
This patch would like to support the form 3 of the scalar signed
integer .SAT_ADD.  Aka below example:

Form 3:
  #define DEF_SAT_S_ADD_FMT_3(T, UT, MIN, MAX)           \
  T __attribute__((noinline))                            \
  sat_s_add_##T##_fmt_3 (T x, T y)                       \
  {                                                      \
    T sum;                                               \
    bool overflow = __builtin_add_overflow (x, y, &sum); \
    return overflow ? x < 0 ? MIN : MAX : sum;           \
  }

DEF_SAT_S_ADD_FMT_3(int8_t, uint8_t, INT8_MIN, INT8_MAX)

We can tell the difference before and after this patch if backend
implemented the ssadd<m>3 pattern similar as below.

Before this patch:
   4   │ __attribute__((noinline))
   5   │ int8_t sat_s_add_int8_t_fmt_3 (int8_t x, int8_t y)
   6   │ {
   7   │   signed char _1;
   8   │   signed char _2;
   9   │   int8_t _3;
  10   │   __complex__ signed char _6;
  11   │   _Bool _8;
  12   │   signed char _9;
  13   │   signed char _10;
  14   │   signed char _11;
  15   │
  16   │ ;;   basic block 2, loop depth 0
  17   │ ;;    pred:       ENTRY
  18   │   _6 = .ADD_OVERFLOW (x_4(D), y_5(D));
  19   │   _2 = IMAGPART_EXPR <_6>;
  20   │   if (_2 != 0)
  21   │     goto <bb 4>; [50.00%]
  22   │   else
  23   │     goto <bb 3>; [50.00%]
  24   │ ;;    succ:       4
  25   │ ;;                3
  26   │
  27   │ ;;   basic block 3, loop depth 0
  28   │ ;;    pred:       2
  29   │   _1 = REALPART_EXPR <_6>;
  30   │   goto <bb 5>; [100.00%]
  31   │ ;;    succ:       5
  32   │
  33   │ ;;   basic block 4, loop depth 0
  34   │ ;;    pred:       2
  35   │   _8 = x_4(D) < 0;
  36   │   _9 = (signed char) _8;
  37   │   _10 = -_9;
  38   │   _11 = _10 ^ 127;
  39   │ ;;    succ:       5
  40   │
  41   │ ;;   basic block 5, loop depth 0
  42   │ ;;    pred:       3
  43   │ ;;                4
  44   │   # _3 = PHI <_1(3), _11(4)>
  45   │   return _3;
  46   │ ;;    succ:       EXIT
  47   │
  48   │ }

After this patch:
   4   │ __attribute__((noinline))
   5   │ int8_t sat_s_add_int8_t_fmt_3 (int8_t x, int8_t y)
   6   │ {
   7   │   int8_t _3;
   8   │
   9   │ ;;   basic block 2, loop depth 0
  10   │ ;;    pred:       ENTRY
  11   │   _3 = .SAT_ADD (x_4(D), y_5(D)); [tail call]
  12   │   return _3;
  13   │ ;;    succ:       EXIT
  14   │
  15   │ }

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

	* match.pd: Add the form 3 of signed .SAT_ADD matching.

Signed-off-by: Pan Li <pan2.li@intel.com>
2024-09-19 18:12:50 +08:00
Pan Li
2545a1abb7 Genmatch: Refine the gen_phi_on_cond by match_cond_with_binary_phi
This patch would like to leverage the match_cond_with_binary_phi to
match the phi on cond, and get the true/false arg if matched.  This
helps a lot to simplify the implementation of gen_phi_on_cond.

Before this patch:
basic_block _b1 = gimple_bb (_a1);
if (gimple_phi_num_args (_a1) == 2)
  {
    basic_block _pb_0_1 = EDGE_PRED (_b1, 0)->src;
    basic_block _pb_1_1 = EDGE_PRED (_b1, 1)->src;
    basic_block _db_1 = safe_dyn_cast <gcond *> (*gsi_last_bb (_pb_0_1)) ? _pb_0_1 : _pb_1_1;
    basic_block _other_db_1 = safe_dyn_cast <gcond *> (*gsi_last_bb (_pb_0_1)) ? _pb_1_1 : _pb_0_1;
    gcond *_ct_1 = safe_dyn_cast <gcond *> (*gsi_last_bb (_db_1));
    if (_ct_1 && EDGE_COUNT (_other_db_1->preds) == 1
        && EDGE_COUNT (_other_db_1->succs) == 1
        && EDGE_PRED (_other_db_1, 0)->src == _db_1)
        {
          tree _cond_lhs_1 = gimple_cond_lhs (_ct_1);
          tree _cond_rhs_1 = gimple_cond_rhs (_ct_1);
          tree _p0 = build2 (gimple_cond_code (_ct_1), boolean_type_node, _cond_lhs_1, _cond_rhs_1);
          bool _arg_0_is_true_1 = gimple_phi_arg_edge (_a1, 0)->flags & EDGE_TRUE_VALUE;
          tree _p1 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 0 : 1);
          tree _p2 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 1 : 0);
...

After this patch:
basic_block _b1 = gimple_bb (_a1);
tree _p1, _p2;
gcond *_cond_1 = match_cond_with_binary_phi (_a1, &_p1, &_p2);
if (_cond_1)
  {
    tree _cond_lhs_1 = gimple_cond_lhs (_cond_1);
    tree _cond_rhs_1 = gimple_cond_rhs (_cond_1);
    tree _p0 = build2 (gimple_cond_code (_cond_1), boolean_type_node, _cond_lhs_1, _cond_rhs_1);
...

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

	* genmatch.cc (dt_operand::gen_phi_on_cond): Leverage the
	match_cond_with_binary_phi API to get cond gimple, true and
	false TREE arg.

Signed-off-by: Pan Li <pan2.li@intel.com>
2024-09-19 18:11:59 +08:00
Andre Vehreschild
361903ad1a Fix deep copy allocatable components in coarrays. [PR85002]
Fix code for deep copy of allocatable components in derived type nested
structures generated, but not inserted when the copy had to be done in
a coarray.  Additionally fix a comment.

gcc/fortran/ChangeLog:

	PR fortran/85002
	* trans-array.cc (duplicate_allocatable_coarray): Allow adding
	of deep copy code in the when-allocated case.  Add bounds
	computation before condition, because coarrays need the bounds
	also when not allocated.
	(structure_alloc_comps): Duplication in the coarray case is done
	already, omit it.  Add the deep-code when duplication a coarray.
	* trans-expr.cc (gfc_trans_structure_assign): Fix comment.

gcc/testsuite/ChangeLog:

	* gfortran.dg/coarray/alloc_comp_9.f90: New test.
2024-09-19 09:52:23 +02:00
Jennifer Schmitz
08aba2dd8c SVE intrinsics: Fold svmul with all-zero operands to zero vector
As recently implemented for svdiv, this patch folds svmul to a zero
vector if one of the operands is a zero vector. This transformation is
applied if at least one of the following conditions is met:
- the first operand is all zeros or
- the second operand is all zeros, and the predicate is ptrue or the
predication is _x or _z.

In contrast to constant folding, which was implemented in a previous
patch, this transformation is applied as soon as one of the operands is
a zero vector, while the other operand can be a variable.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>

gcc/
	* config/aarch64/aarch64-sve-builtins-base.cc (svmul_impl::fold):
	Add folding of all-zero operands to zero vector.

gcc/testsuite/
	* gcc.target/aarch64/sve/const_fold_mul_1.c: Adjust expected
	outcome.
	* gcc.target/aarch64/sve/fold_mul_zero.c: New test.
2024-09-19 09:23:24 +02:00
Kyrylo Tkachov
9a99559a47
aarch64: Define l1_cache_line_size for -mcpu=neoverse-v2
This is a small patch that sets the L1 cache line size for Neoverse V2.
Unlike the other cache-related constants in there this value is not used just
for SW prefetch generation (which we want to avoid for Neoverse V2 presently).
It's also used to set std::hardware_destructive_interference_size.
See the links and recent discussions in PR116662 for reference.
Some CPU tunings in aarch64 set this value to something useful, but for
generic tuning we use the conservative 256, which forces 256-byte alignment
in such atomic structures.  Using a smaller value can decrease the size of such
structs during layout and should not present an ABI problem as
std::hardware_destructive_interference_size is not intended to be used for structs
in an external interface, and GCC warns about such uses.
Another place where the L1 cache line size is used is in phiopt for
-fhoist-adjacent-loads where conditional accesses to adjacent struct members
can be speculatively loaded as long as they are within the same L1 cache line.
e.g.
struct S { int i; int j; };

int
bar (struct S *x, int y)
{
  int r;
  if (y)
    r = x->i;
  else
    r = x->j;
  return r;
}

The Neoverse V2 L1 cache line is 64 bytes according to the TRM, so set it to
that. The rest of the prefetch parameters inherit from the generic tuning so
we don't do anything extra for software prefeteches.

Bootstrapped and tested on aarch64-none-linux-gnu.

Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>

	* config/aarch64/tuning_models/neoversev2.h (neoversev2_prefetch_tune):
	Define.
	(neoversev2_tunings): Use it.
2024-09-19 09:16:25 +02:00
Hu, Lin1
1cf1bf7899 i386: Add ssemov2, sseicvt2 for some load instructions that use memory on operand2
The memory attr of some instructions should be 'load', but these are
'none', currently.

gcc/ChangeLog:

	* config/i386/i386.md: Add ssemov2, sseicvt2.
	* config/i386/sse.md (sse2_cvtsi2sd): Apply sseicvt2.
	(sse2_cvtsi2sdq<round_name>): Ditto.
	(vec_set<mode>_0): Apply ssemov2 for 4, 6.
2024-09-19 14:53:41 +08:00
Pan Li
65e060c7d8 Match: Add interface match_cond_with_binary_phi for true/false arg
When matching the cond with 2 args phi node, we need to figure out
which arg of phi node comes from the true edge of cond block, as
well as the false edge.  This patch would like to add interface
to perform the action and return the true and false arg in TREE type.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

	* gimple-match-head.cc (match_cond_with_binary_phi): Add new func
	impl to match binary phi for true and false arg.

Signed-off-by: Pan Li <pan2.li@intel.com>
2024-09-19 14:44:46 +08:00
Haochen Jiang
877fb9bdb0 doc: Add more alias option and reorder Intel CPU -march documentation
Since r15-3539, there are requests coming in to add other alias option
documentation. This patch will add all ot them, including corei7, corei7-avx,
core-avx-i, core-avx2, atom, slm, gracemont and emerarldrapids.

Also in the patch, I reordered that part of documentation, currently all
the CPUs/products are just all over the place. I regrouped them by
date-to-now products (since the very first CPU to latest Panther Lake), P-core
(since the clients become hybrid cores, starting from Sapphire Rapids) and
E-core (since Bonnell to latest Clearwater Forest).

And in the patch, I refined the product names in documentation.

gcc/ChangeLog:

	* doc/invoke.texi: Add corei7, corei7-avx, core-avx-i,
	core-avx2, atom, slm, gracemont and emerarldrapids. Reorder
	the -march documentation by splitting them into date-to-now
	products, P-core and E-core. Refine the product names in
	documentation.
2024-09-19 14:12:50 +08:00
Haochen Jiang
89e62d42f3 i386: Enhance AVX10.2 convert tests
For AVX10.2 convert tests, all of them are missing mask tests
previously, this patch will add them in the tests.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx10_2-512-vcvt2ps2phx-2.c: Enhance mask test.
	* gcc.target/i386/avx10_2-512-vcvtbiasph2bf8-2.c: Ditto.
	* gcc.target/i386/avx10_2-512-vcvtbiasph2bf8s-2.c: Ditto.
	* gcc.target/i386/avx10_2-512-vcvtbiasph2hf8-2.c: Ditto.
	* gcc.target/i386/avx10_2-512-vcvtbiasph2hf8s-2.c: Ditto.
	* gcc.target/i386/avx10_2-512-vcvthf82ph-2.c: Ditto.
	* gcc.target/i386/avx10_2-512-vcvtne2ph2bf8-2.c: Ditto.
	* gcc.target/i386/avx10_2-512-vcvtne2ph2bf8s-2.c: Ditto.
	* gcc.target/i386/avx10_2-512-vcvtne2ph2hf8-2.c: Ditto.
	* gcc.target/i386/avx10_2-512-vcvtne2ph2hf8s-2.c: Ditto.
	* gcc.target/i386/avx10_2-512-vcvtneph2bf8-2.c: Ditto.
	* gcc.target/i386/avx10_2-512-vcvtneph2bf8s-2.c: Ditto.
	* gcc.target/i386/avx10_2-512-vcvtneph2hf8-2.c: Ditto.
	* gcc.target/i386/avx10_2-512-vcvtneph2hf8s-2.c: Ditto.
	* gcc.target/i386/avx512f-helper.h: Fix a typo in macro define.
2024-09-19 14:11:20 +08:00
Haochen Jiang
2b7b8d3bb5 i386: Add missing avx512f-mask-type.h include
Since commit r15-3594, we fixed the bugs in MASK_TYPE for AVX10.2
testcases, but we missed the following four.

The tests are not FAIL since the binutils part haven't been merged
yet, which leads to UNSUPPORTED test. But the avx512f-mask-type.h
needs to be included, otherwise, it will be compile error.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx10_2-512-vpdpbssd-2.c: Include
	avx512f-mask-type.h.
	* gcc.target/i386/avx10_2-vminmaxsd-2.c: Ditto.
	* gcc.target/i386/avx10_2-vminmaxsh-2.c: Ditto.
	* gcc.target/i386/avx10_2-vminmaxss-2.c: Ditto.
2024-09-19 14:10:28 +08:00
Hans-Peter Nilsson
b1ea710b1b testsuite/gcc.dg/pr84877.c: Add machinery to stabilize stack aligmnent
This test awkwardly "blinks"; xfails and xpasses apparently
randomly for cris-elf using the "gdb simulator".  On
inspection, I see that the stack address depends on the
number of environment variables, deliberately passed to the
simulator, each adding the size of a pointer.

This test is IMHO important enough not to be just skipped
just because it blinks (fixing the actual problem is a
different task).

I guess a random non-16 stack-alignment could happen for
other targets as well, so let's try and add a generic
machinery to "stabilize" the test as failing, by allocating
a dynamic amount to make sure it's misaligned.  The most
target-dependent item here is an offset between the incoming
stack-pointer value (within main in the added framework) and
outgoing (within "xmain" as called from main when setting up
the p0 parameter).  I know there are other wonderful stack
shapes, but such targets would fall under the "complicated
situations"-label and are no worse off than before.

	* gcc.dg/pr84877.c: Try to make the test result	consistent by
	misaligning the stack.
2024-09-19 04:22:58 +02:00
GCC Administrator
57faabfbb3 Daily bump. 2024-09-19 00:18:55 +00:00
Pan Li
1d16875134 RISC-V: Fix signed SAT_ADD test case for int64_t
The int8_t test for signed SAT_ADD is sat_s_add-1.c, the sat_s_add-4.c
should be for int64_t.  Thus, update sat_s_add-4.c for int64_t type.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/sat_s_add-4.c: Update test for int64_t
	instead of int8_t.

Signed-off-by: Pan Li <pan2.li@intel.com>
2024-09-19 07:38:32 +08:00
Jason Merrill
aa338bdd46 libstdc++: add braces
GCC compiles with -fno-exceptions, so __throw_exception_again is a no-op,
and compilation gives a -Wempty-body warning here, so let's wrap it as is
already done in a few other files.

libstdc++-v3/ChangeLog:

	* include/bits/basic_ios.h: Add braces.
2024-09-18 17:33:46 -04:00
Andrew Kreimer
cc62b2c3da [PATCH] configure: fix typos
/
	* configure.ac: Fix typos.
	* configure: Rebuilt.
2024-09-18 11:51:45 -06:00
Patrick Palka
82c2acd0bc c++: alias of decltype(lambda) is opaque [PR116714, PR107390]
Here for

  using type = decltype([]{});
  static_assert(is_same_v<type, type>);

we strip the alias ahead of time during template argument coercion
which effectively transforms the template-id into

  is_same_v<decltype([]{}), decltype([]{})>

which is wrong because later substitution into the template-id will
produce two new lambdas with distinct types and cause is_same_v to
return false.

This demonstrates that such aliases should be considered opaque (a
notion that we recently introduced in r15-2331-g523836716137d0).
(An alternative solution might be to consider memoizing lambda-expr
substitution rather than always producing a new lambda, but this is
much simpler.)

	PR c++/116714
	PR c++/107390

gcc/cp/ChangeLog:

	* pt.cc (dependent_opaque_alias_p): Also return true for a
	decltype(lambda) alias.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp2a/lambda-uneval18.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>
2024-09-18 13:50:43 -04:00
Francois-Xavier Coudert
fe1ed68000 jit: Ensure ssize_t is defined
On some targets it seems that ssize_t is not defined by any of the
headers transitively included by <stdio.h>.  This leads to a bootstrap
fail when jit is enabled.

gcc/jit/ChangeLog:

	* libgccjit.h: Include <sys/types.h>
2024-09-18 18:00:38 +02:00
John David Anglin
4b03750f8c hppa: Add peephole2 optimizations for REG+D loads and stores
The PA 1.x architecture only supports long displacements in
integer loads and stores.  Floating-point loads and stores
only support short displacements.  As a result, we have to
wait until reload is complete before generating insns with
long displacements.

The PA 2.0 architecture supports long displacements in both
integer and floating-point loads and stores.

The peephole2 optimizations added in this change are only
enabled when 14-bit long displacements aren't supported for
floating-point loads and stores.

2024-09-18  John David Anglin  <danglin@gcc.gnu.org>

gcc/ChangeLog:

	* config/pa/pa.h (GENERAL_REGNO_P): Define.
	* config/pa/pa.md: Add SImode and SFmode peephole2
	patterns to generate loads and stores with long
	displacements.
2024-09-18 11:02:32 -04:00
Jin Ma
85fcf74034 [PATCH v3] RISC-V: Fixed incorrect semantic description in DF to DI pattern in the Zfa extension on rv32.
gcc/ChangeLog:

	* config/riscv/riscv.md: Change "truncate" to unspec for the Zfa extension on rv32.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/zfa-fmovh-fmovp-bug.c: New test.
2024-09-18 08:57:27 -06:00
Filip Kastl
4b7e6d5faa contrib: Set check-params-in-docs.py to skip tables of values of a param
Currently check-params-in-docs.py reports extra params being listed in
invoke.texi.  However, those aren't actual params but items in a table of
possible values of the aarch64-autove-preference param.

This patch changes check-params-in-docs.py to ignore similar tables.

contrib/ChangeLog:

	* check-params-in-docs.py: Skip tables of values of a param.
	Remove code that skips items beginning with a number.

Signed-off-by: Filip Kastl <fkastl@suse.cz>
2024-09-18 16:38:30 +02:00
Richard Biener
de1389e24e Fail vectorization when not using SLP and --param vect-force-slp == 1
The following adds --param vect-force-slp to enable the transition
to full SLP.  Full SLP is enforced during stmt analysis where it
detects failed SLP discovery and at loop analysis time where it
avoids analyzing a loop with SLP disabled.  Failure to SLP results
in vectorization to fail.

	* params.opt (vect-force-slp): New param, default 0.
	* doc/invoke.texi (--param vect-force-slp): Document.
	* tree-vect-loop.cc (vect_analyze_loop_2): When analyzing
	without SLP but --param vect-force-slp is 1 fail.
	* tree-vect-stmts.cc (vect_analyze_stmt): Fail vectorization
	for non-SLP stmts when --param vect-force-slp is 1.
2024-09-18 15:40:10 +02:00
Xianmiao Qu
ad5bfc2b70 [PATCH 1/2] RISC-V: Fix the outer_code when calculating the cost of SET expression.
I think it is a typo. When calculating the 'SET_SRC (x)' cost,
outer_code should be set to SET.

gcc/
	* config/riscv/riscv.cc (riscv_rtx_costs): Fix the outer_code
	when calculating the cost of SET expression.
2024-09-18 07:35:12 -06:00
Xianmiao Qu
ec34a4481b [PATCH] RISC-V: Fix th.extu operands exceeding range on rv32.
The Combine Pass may generate zero_extract instructions that are out of range.
Drawing from other architectures like AArch64, we should impose restrictions
on the "*th_extu<mode>4" pattern.

gcc/
	* config/riscv/thead.md (*th_extu<mode>4): Fix th.extu
	operands exceeding range on rv32.

gcc/testsuite/
	* gcc.target/riscv/xtheadbb-extu-4.c: New.
2024-09-18 07:29:36 -06:00
Bohan Lei
0756f335fb [PATCH] RISC-V: Allow zero operand for DI variants of vssubu.vx
The RISC-V vector machine description relies on the helper function
`sew64_scalar_helper` to emit actual insns for the DI variants of
vssub.vx and vssubu.vx.  This works with vssub.vx, but can cause
problems with vssubu.vx with the scalar operand being constant zero,
because `has_vi_variant_p` returns false, and the operand will be taken
without being loaded into a reg.  The attached testcases can cause an
internal compiler error as a result.

Allowing a constant zero operand in those insns seems to be a simple
solution that only affects minimum existing code.

gcc/ChangeLog:

	* config/riscv/vector.md: Allow zero operand for DI variants of
	vssubu.vx

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/base/vssubu-1.c: New test.
	* gcc.target/riscv/rvv/base/vssubu-2.c: New test.
2024-09-18 07:22:04 -06:00
Jason Merrill
5c8f9f4d4c c++: -Wdangling-reference diagnostic
The -Wdangling-reference diagnostic talks about the full-expression, but
prints one call, while the full-expression in a declaration is the entire
initialization.  It seems more useful to point out the temporary that the
compiler thinks we might be getting a dangling reference to.

gcc/cp/ChangeLog:

	* call.cc (do_warn_dangling_reference): Return temporary
	instead of the call it's passed to.
	(maybe_warn_dangling_reference): Adjust diagnostic.

gcc/testsuite/ChangeLog:

	* g++.dg/warn/Wdangling-reference1.C: Adjust diagnostic.
2024-09-18 08:59:04 -04:00
Jason Merrill
8733d5d387 c++: -Wdangling-reference and empty class [PR115361]
We can't have a dangling reference to an empty class unless it's
specifically to that class or one of its bases.  This was giving a
false positive on the _ExtractKey pattern in libstdc++ hashtable.h.

This also adjusts the order of arguments to reference_related_p, which
is relevant for empty classes (unlike scalars).

Several of the classes in the testsuite needed to gain data members to
continue to warn.

	PR c++/115361

gcc/cp/ChangeLog:

	* call.cc (do_warn_dangling_reference): Check is_empty_class.

gcc/testsuite/ChangeLog:

	* g++.dg/ext/attr-no-dangling6.C
	* g++.dg/ext/attr-no-dangling7.C
	* g++.dg/ext/attr-no-dangling8.C
	* g++.dg/ext/attr-no-dangling9.C
	* g++.dg/warn/Wdangling-reference1.C
	* g++.dg/warn/Wdangling-reference2.C
	* g++.dg/warn/Wdangling-reference3.C: Make classes non-empty.
	* g++.dg/warn/Wdangling-reference23.C: New test.
2024-09-18 08:58:52 -04:00
Jennifer Schmitz
6f3b6a4517 match.pd: Check trunc_mod vector obtap before folding.
In the pattern X - (X / Y) * Y to X % Y, this patch guards the
simplification for vector types by a check for:
1) Support of the mod optab for vectors OR
2) Application before vector lowering for non-VL vectors.
This is to prevent reverting vectorization of modulo to div/mult/sub
if the target does not support vector mod optab.

The patch was bootstrapped and tested with no regression on
aarch64-linux-gnu and x86_64-linux-gnu.
OK for mainline?

Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>

gcc/
	PR tree-optimization/116569
	* match.pd: Guard simplification to trunc_mod with check for
	mod optab support.

gcc/testsuite/
	PR tree-optimization/116569
	* gcc.dg/torture/pr116569.c: New test.
2024-09-18 13:07:25 +02:00
Georg-Johann Lay
5bfb91c14f reload1.cc: rtl-optimization/116326 - Use RELOAD_ELIMINABLE_REGS.
The new macro is required because reload and LRA are using different
representations for a multi-register frame pointer.  As ELIMINABLE_REGS
is used to initialize static const objects, it can't depend on -mlra.

	PR rtl-optimization/116326
gcc/
	* reload1.cc (reg_eliminate_1): Initialize from
	RELOAD_ELIMINABLE_REGS if defined.
	* config/avr/avr.h (RELOAD_ELIMINABLE_REGS): Copy from ELIMINABLE_REGS.
	(ELIMINABLE_REGS): Don't mention sub-regnos of the frame pointer.
	* doc/tm.texi.in (Eliminating Frame Pointer and Arg Pointer)
	<RELOAD_ELIMINABLE_REGS>: Add documentation.
	* doc/tm.texi: Rebuild.
gcc/testsuite/
	* gcc.target/avr/torture/lra-pr116324.c: New test.
	* gcc.target/avr/torture/lra-pr116325.c: New test.
2024-09-18 11:17:23 +02:00
Georg-Johann Lay
cdeebc71c4 AVR: doc/install.texi - Update avr specific installation notes.
gcc/
	* doc/install.texi (Host/Target specific installation notes for GCC)
	[avr]: Update web links to AVR-LibC and AVR Options.
	Remove outdated note about Binutils.
2024-09-18 11:15:09 +02:00
Richard Biener
1d0cb3b5fc tree-optimization/116585 - SSA corruption with split_constant_offset
split_constant_offset when looking through SSA defs can end up
picking SSA leafs that are subject to abnormal coalescing.  This
can lead to downstream consumers to insert code based on the
result (like from dataref analysis) in places that violate constraints
for abnormal coalescing.  It's best to not expand defs whose operands
are subject to abnormal coalescing - and not either do something when
a subexpression has operands like that already.

	PR tree-optimization/116585
	* tree-data-ref.cc (split_constant_offset_1): When either
	operand is subject to abnormal coalescing do no further
	processing.

	* gcc.dg/torture/pr116585.c: New testcase.
2024-09-18 11:11:28 +02:00
Andrew Pinski
45cacfe732 phiopt: C++ify cond_if_else_store_replacement
This C++ify cond_if_else_store_replacement by using range fors
and changing using a std::pair instead of 2 vecs.
I had a hard time understanding the code when there was 2 vecs
so having a vec of a pair makes it easier to understand the relationship
between the 2.

gcc/ChangeLog:

	* tree-ssa-phiopt.cc (cond_if_else_store_replacement): Use
	range fors and use one vec for then/else stores instead of 2.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-09-17 23:20:40 -07:00
Andrew Pinski
8590dcd318 phiopt: Add some details dump to cselim
While trying to debug PR 116747, I noticed there was no dump
saying what was done. So this adds the debug dump and it helps
debug what is going on in PR 116747 too.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

	* tree-ssa-phiopt.cc (cond_if_else_store_replacement_1): Add debug dump.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-09-17 23:20:40 -07:00
Pan Li
a82896ed7b RISC-V: Implement SAT_ADD for signed integer vector
This patch would like to implement the ssadd for vector integer.  Aka
form 1 of ssadd vector.

Form 1:
  #define DEF_VEC_SAT_S_ADD_FMT_1(T, UT, MIN, MAX)                     \
  void __attribute__((noinline))                                       \
  vec_sat_s_add_##T##_fmt_1 (T *out, T *op_1, T *op_2, unsigned limit) \
  {                                                                    \
    unsigned i;                                                        \
    for (i = 0; i < limit; i++)                                        \
      {                                                                \
        T x = op_1[i];                                                 \
        T y = op_2[i];                                                 \
        T sum = (UT)x + (UT)y;                                         \
        out[i] = (x ^ y) < 0                                           \
          ? sum                                                        \
          : (sum ^ x) >= 0                                             \
            ? sum                                                      \
            : x < 0 ? MIN : MAX;                                       \
      }                                                                \
  }

DEF_VEC_SAT_S_ADD_FMT_1(int64_t, uint64_t, INT64_MIN, INT64_MAX)

Before this patch:
vec_sat_s_add_int64_t_fmt_1:
  ...
  vsetvli  t1,zero,e64,m1,ta,mu
  vadd.vv  v3,v1,v2
  vxor.vv  v0,v1,v3
  vmslt.vi v0,v0,0
  vxor.vv  v2,v1,v2
  vmsge.vi v2,v2,0
  vmand.mm v0,v0,v2
  vsra.vx  v1,v1,t3
  vxor.vv  v3,v1,v4,v0.t
  ...

After this patch:
vec_sat_s_add_int64_t_fmt_1:
  ...
  vsetvli  a6,zero,e64,m1,ta,ma
  vsadd.vv v1,v1,v2
  ...

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/ChangeLog:

	* config/riscv/autovec.md (ssadd<mode>3): Add new pattern for
	signed integer vector SAT_ADD.
	* config/riscv/riscv-protos.h (expand_vec_ssadd): Add new func
	decl for vector ssadd expanding.
	* config/riscv/riscv-v.cc (expand_vec_ssadd): Add new func impl
	to expand vector ssadd pattern.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/binop/vec_sat_data.h: Add test
	data for vector ssadd.
	* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper
	macros.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-1.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-2.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-3.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-4.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-1.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-2.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-3.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-4.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
2024-09-18 09:25:39 +08:00
Michael Meissner
9a07ac1513 PR 89213: Add better support for shifting vectors with 64-bit elements
This patch fixes PR target/89213 to allow better code to be generated to do
constant shifts of V2DI/V2DF vectors.  Previously GCC would do constant shifts
of vectors with 64-bit elements by using:

	XXSPLTIB 32,4
	VEXTSB2D 0,0
	VSRAD 2,2,0

I.e., the PowerPC does not have a VSPLTISD instruction to load -15..14 for the
64-bit shift count in one instruction.  Instead, it would need to load a byte
and then convert it to 64-bit.

With this patch, GCC now realizes that the vector shift instructions will look
at the bottom 6 bits for the shift count, and it can use either a VSPLTISW or
XXSPLTIB instruction to load the shift count.

2024-09-17  Michael Meissner  <meissner@linux.ibm.com>

gcc/

	PR target/89213
	* config/rs6000/altivec.md (UNSPEC_VECTOR_SHIFT): New unspec.
	(VSHIFT_MODE): New mode iterator.
	(vshift_code): New code iterator.
	(vshift_attr): New code attribute.
	(altivec_<mode>_<vshift_attr>_const): New pattern to optimize
	vector long long/int shifts by a constant.
	(altivec_<mode>_shift_const): New helper insn to load up a
	constant used by the shift operation.
	* config/rs6000/predicates.md (vector_shift_constant): New
	predicate.

gcc/testsuite/

	PR target/89213
	* gcc.target/powerpc/pr89213.c: New test.
	* gcc.target/powerpc/vec-rlmi-rlnm.c: Update instruction count.
2024-09-17 21:05:27 -04:00
GCC Administrator
38b5a568f8 Daily bump. 2024-09-18 00:16:55 +00:00
Marek Polacek
d6d8445c85 c++: fix constexpr cast from void* diag issue [PR116741]
The result of build_fold_indirect_ref can be a COMPONENT_REF in
which case using DECL_SOURCE_LOCATION will crash.  Look at its op1
instead.

	PR c++/116741

gcc/cp/ChangeLog:

	* constexpr.cc (cxx_eval_constant_expression) <case CONVERT_EXPR>: If
	the result of build_fold_indirect_ref is a COMPONENT_REF, use its op1.
	Check DECL_P before calling inform.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp26/constexpr-voidptr4.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>
2024-09-17 17:04:20 -04:00
Marek Polacek
7ca486889b c++: ICE with -Wtautological-compare in template [PR116534]
Pre r14-4793, we'd call warn_tautological_cmp -> operand_equal_p
with operands wrapped in NON_DEPENDENT_EXPR, which works, since
o_e_p bails for codes it doesn't know.  But now we pass operands
not encapsulated in NON_DEPENDENT_EXPR, and crash, because the
template tree for &a[x] has null DECL_FIELD_OFFSET.

This patch extends r12-7797 to cover the case when DECL_FIELD_OFFSET
is null.

	PR c++/116534

gcc/ChangeLog:

	* fold-const.cc (operand_compare::operand_equal_p): If either
	field's DECL_FIELD_OFFSET is null, compare the fields with ==.

gcc/testsuite/ChangeLog:

	* g++.dg/warn/Wtautological-compare4.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>
2024-09-17 16:59:09 -04:00
Marek Polacek
dfe0d4389a c++: crash with anon VAR_DECL [PR116676]
r12-3495 added maybe_warn_about_constant_value which will crash if
it gets a nameless VAR_DECL, which is what happens in this PR.

We created this VAR_DECL in cp_parser_decomposition_declaration.

	PR c++/116676

gcc/cp/ChangeLog:

	* constexpr.cc (maybe_warn_about_constant_value): Check DECL_NAME.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp1z/constexpr-116676.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>
2024-09-17 11:27:39 -04:00
Jennifer Schmitz
e311dd13a9 SVE intrinsics: Fold svdiv with all-zero operands to zero vector
This patch folds svdiv where one of the operands is all-zeros to a zero
vector, if one of the following conditions holds:
- the dividend is all zeros or
- the divisor is all zeros, and the predicate is ptrue or the predication
is _x or _z.
This case was not covered by the recent patch that implemented constant
folding, because that covered only cases where both operands are
constant vectors. Here, the operation is folded as soon as one of the operands
is a constant zero vector.
Folding of divison by 0 to return 0 is in accordance with
the semantics of sdiv and udiv.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>

gcc/
	* config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold):
	Add folding of all-zero operands to zero vector.

gcc/testsuite/
	* gcc.target/aarch64/sve/fold_div_zero.c: New test.
	* gcc.target/aarch64/sve/const_fold_div_1.c: Adjust expected
	outcome.
2024-09-17 08:33:24 +02:00
GCC Administrator
008f4510d7 Daily bump. 2024-09-17 00:17:21 +00:00
Pengxuan Zheng
a92f54f580 aarch64: Improve vector constant generation using SVE INDEX instruction [PR113328]
SVE's INDEX instruction can be used to populate vectors by values starting from
"base" and incremented by "step" for each subsequent value. We can take
advantage of it to generate vector constants if TARGET_SVE is available and the
base and step values are within [-16, 15].

For example, with the following function:

typedef int v4si __attribute__ ((vector_size (16)));
v4si
f_v4si (void)
{
  return (v4si){ 0, 1, 2, 3 };
}

GCC currently generates:

f_v4si:
	adrp    x0, .LC4
	ldr     q0, [x0, #:lo12:.LC4]
	ret

.LC4:
	.word   0
	.word   1
	.word   2
	.word   3

With this patch, we generate an INDEX instruction instead if TARGET_SVE is
available.

f_v4si:
	index   z0.s, #0, #1
	ret

	PR target/113328

gcc/ChangeLog:

	* config/aarch64/aarch64.cc (aarch64_simd_valid_immediate): Improve
	handling of some ADVSIMD vectors by using SVE's INDEX if TARGET_SVE is
	available.
	(aarch64_output_simd_mov_immediate): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/sve/acle/general/dupq_1.c: Update test to use
	SVE's INDEX instruction.
	* gcc.target/aarch64/sve/acle/general/dupq_2.c: Likewise.
	* gcc.target/aarch64/sve/acle/general/dupq_3.c: Likewise.
	* gcc.target/aarch64/sve/acle/general/dupq_4.c: Likewise.
	* gcc.target/aarch64/sve/vec_init_3.c: New test.

Signed-off-by: Pengxuan Zheng <quic_pzheng@quicinc.com>
2024-09-16 10:31:10 -07:00
Gaius Mulley
58bc39c73c modula2: gcc/m2/Make-lang.in fix includes during bootstrap build
This patch fixes the include directories used when building objects in
gm2-compiler-boot.  It adds the missing gm2-gcc directory and uses a
new variable GM2_BOOT_INCLUDES for all gm2-compiler-boot rules.

gcc/m2/ChangeLog:

	* Make-lang.in (GM2_BOOT_INCLUDES): New variable.
	(m2/gm2-compiler-boot/M2GCCDeclare.o): Rewrite to use
	GM2_BOOT_INCLUDES.
	(m2/gm2-compiler-boot/M2Error.o): Ditto.
	(m2/gm2-compiler-boot/%.o): Ditto.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
2024-09-16 18:18:11 +01:00
Georg-Johann Lay
f5448384a2 AVR: Update weblinks to AVR-LibC.
AVR-LibC has moved to GitHub, adjust web links:
https://github.com/avrdudes/avr-libc (project)
https://avrdudes.github.io/avr-libc/avr-libc-user-manual (wwwdocs)

gcc/
	* doc/invoke.texi (AVR Options): Update AVR-LibC weblink from
	nongnu.org to https://github.com/avrdudes/avr-libc
	* doc/extend.texi (AVR Named Address Spaces): Same.
	(AVR Function Attributes): Same.
	* doc/install.texi (Cross-Compiler-Specific Options, AVR): Same.
2024-09-16 17:51:43 +02:00
Soumya AR
4af196b2eb aarch64: Emit ADD X, Y, Y instead of SHL X, Y, #1 for SVE instructions.
On Neoverse V2, SVE ADD instructions have a throughput of 4, while shift
instructions like SHL have a throughput of 2. We can lean on that to emit code
like:
 add	z31.b, z31.b, z31.b
instead of:
 lsl	z31.b, z31.b, #1

The implementation of this change for SVE vectors is similar to a prior patch
<https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659958.html> that adds
the above functionality for Neon vectors.

Here, the machine descriptor pattern is split up to separately accommodate left
and right shifts, so we can specifically emit an add for all left shifts by 1.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Soumya AR <soumyaa@nvidia.com>

gcc/ChangeLog:

	* config/aarch64/aarch64-sve.md (*post_ra_v<optab><mode>3): Split pattern
	to accomodate left and right shifts separately.
	(*post_ra_v_ashl<mode>3): Matches left shifts with additional
	constraint to check for shifts by 1.
	(*post_ra_v_<optab><mode>3): Matches right shifts.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/sve/acle/asm/lsl_s16.c: Updated instances of lsl-1
	with corresponding add.
	* gcc.target/aarch64/sve/acle/asm/lsl_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/lsl_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/lsl_s8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/lsl_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/lsl_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/lsl_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/lsl_u8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/lsl_wide_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/lsl_wide_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/lsl_wide_s8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/lsl_wide_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/lsl_wide_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/lsl_wide_u8.c: Likewise.
	* gcc.target/aarch64/sve/adr_1.c: Likewise.
	* gcc.target/aarch64/sve/adr_6.c: Likewise.
	* gcc.target/aarch64/sve/cond_mla_7.c: Likewise.
	* gcc.target/aarch64/sve/cond_mla_8.c: Likewise.
	* gcc.target/aarch64/sve/shift_2.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/rshl_s16.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/rshl_s32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/rshl_s64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/rshl_s8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/rshl_u16.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/rshl_u32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/rshl_u64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/rshl_u8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u64.c: Likewise.
	* gcc.target/aarch64/sve/sve_shl_add.c: New test.
2024-09-16 16:53:45 +02:00