When matching the cond with 2 args phi node, we need to figure out
which arg of phi node comes from the true edge of cond block, as
well as the false edge. This patch would like to add interface
to perform the action and return the true and false arg in TREE type.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.
gcc/ChangeLog:
* gimple-match-head.cc (match_cond_with_binary_phi): Add new func
impl to match binary phi for true and false arg.
Signed-off-by: Pan Li <pan2.li@intel.com>
Since r15-3539, there are requests coming in to add other alias option
documentation. This patch will add all ot them, including corei7, corei7-avx,
core-avx-i, core-avx2, atom, slm, gracemont and emerarldrapids.
Also in the patch, I reordered that part of documentation, currently all
the CPUs/products are just all over the place. I regrouped them by
date-to-now products (since the very first CPU to latest Panther Lake), P-core
(since the clients become hybrid cores, starting from Sapphire Rapids) and
E-core (since Bonnell to latest Clearwater Forest).
And in the patch, I refined the product names in documentation.
gcc/ChangeLog:
* doc/invoke.texi: Add corei7, corei7-avx, core-avx-i,
core-avx2, atom, slm, gracemont and emerarldrapids. Reorder
the -march documentation by splitting them into date-to-now
products, P-core and E-core. Refine the product names in
documentation.
Since commit r15-3594, we fixed the bugs in MASK_TYPE for AVX10.2
testcases, but we missed the following four.
The tests are not FAIL since the binutils part haven't been merged
yet, which leads to UNSUPPORTED test. But the avx512f-mask-type.h
needs to be included, otherwise, it will be compile error.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx10_2-512-vpdpbssd-2.c: Include
avx512f-mask-type.h.
* gcc.target/i386/avx10_2-vminmaxsd-2.c: Ditto.
* gcc.target/i386/avx10_2-vminmaxsh-2.c: Ditto.
* gcc.target/i386/avx10_2-vminmaxss-2.c: Ditto.
This test awkwardly "blinks"; xfails and xpasses apparently
randomly for cris-elf using the "gdb simulator". On
inspection, I see that the stack address depends on the
number of environment variables, deliberately passed to the
simulator, each adding the size of a pointer.
This test is IMHO important enough not to be just skipped
just because it blinks (fixing the actual problem is a
different task).
I guess a random non-16 stack-alignment could happen for
other targets as well, so let's try and add a generic
machinery to "stabilize" the test as failing, by allocating
a dynamic amount to make sure it's misaligned. The most
target-dependent item here is an offset between the incoming
stack-pointer value (within main in the added framework) and
outgoing (within "xmain" as called from main when setting up
the p0 parameter). I know there are other wonderful stack
shapes, but such targets would fall under the "complicated
situations"-label and are no worse off than before.
* gcc.dg/pr84877.c: Try to make the test result consistent by
misaligning the stack.
The int8_t test for signed SAT_ADD is sat_s_add-1.c, the sat_s_add-4.c
should be for int64_t. Thus, update sat_s_add-4.c for int64_t type.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_s_add-4.c: Update test for int64_t
instead of int8_t.
Signed-off-by: Pan Li <pan2.li@intel.com>
GCC compiles with -fno-exceptions, so __throw_exception_again is a no-op,
and compilation gives a -Wempty-body warning here, so let's wrap it as is
already done in a few other files.
libstdc++-v3/ChangeLog:
* include/bits/basic_ios.h: Add braces.
Here for
using type = decltype([]{});
static_assert(is_same_v<type, type>);
we strip the alias ahead of time during template argument coercion
which effectively transforms the template-id into
is_same_v<decltype([]{}), decltype([]{})>
which is wrong because later substitution into the template-id will
produce two new lambdas with distinct types and cause is_same_v to
return false.
This demonstrates that such aliases should be considered opaque (a
notion that we recently introduced in r15-2331-g523836716137d0).
(An alternative solution might be to consider memoizing lambda-expr
substitution rather than always producing a new lambda, but this is
much simpler.)
PR c++/116714
PR c++/107390
gcc/cp/ChangeLog:
* pt.cc (dependent_opaque_alias_p): Also return true for a
decltype(lambda) alias.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/lambda-uneval18.C: New test.
Reviewed-by: Jason Merrill <jason@redhat.com>
On some targets it seems that ssize_t is not defined by any of the
headers transitively included by <stdio.h>. This leads to a bootstrap
fail when jit is enabled.
gcc/jit/ChangeLog:
* libgccjit.h: Include <sys/types.h>
The PA 1.x architecture only supports long displacements in
integer loads and stores. Floating-point loads and stores
only support short displacements. As a result, we have to
wait until reload is complete before generating insns with
long displacements.
The PA 2.0 architecture supports long displacements in both
integer and floating-point loads and stores.
The peephole2 optimizations added in this change are only
enabled when 14-bit long displacements aren't supported for
floating-point loads and stores.
2024-09-18 John David Anglin <danglin@gcc.gnu.org>
gcc/ChangeLog:
* config/pa/pa.h (GENERAL_REGNO_P): Define.
* config/pa/pa.md: Add SImode and SFmode peephole2
patterns to generate loads and stores with long
displacements.
gcc/ChangeLog:
* config/riscv/riscv.md: Change "truncate" to unspec for the Zfa extension on rv32.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/zfa-fmovh-fmovp-bug.c: New test.
Currently check-params-in-docs.py reports extra params being listed in
invoke.texi. However, those aren't actual params but items in a table of
possible values of the aarch64-autove-preference param.
This patch changes check-params-in-docs.py to ignore similar tables.
contrib/ChangeLog:
* check-params-in-docs.py: Skip tables of values of a param.
Remove code that skips items beginning with a number.
Signed-off-by: Filip Kastl <fkastl@suse.cz>
The following adds --param vect-force-slp to enable the transition
to full SLP. Full SLP is enforced during stmt analysis where it
detects failed SLP discovery and at loop analysis time where it
avoids analyzing a loop with SLP disabled. Failure to SLP results
in vectorization to fail.
* params.opt (vect-force-slp): New param, default 0.
* doc/invoke.texi (--param vect-force-slp): Document.
* tree-vect-loop.cc (vect_analyze_loop_2): When analyzing
without SLP but --param vect-force-slp is 1 fail.
* tree-vect-stmts.cc (vect_analyze_stmt): Fail vectorization
for non-SLP stmts when --param vect-force-slp is 1.
I think it is a typo. When calculating the 'SET_SRC (x)' cost,
outer_code should be set to SET.
gcc/
* config/riscv/riscv.cc (riscv_rtx_costs): Fix the outer_code
when calculating the cost of SET expression.
The Combine Pass may generate zero_extract instructions that are out of range.
Drawing from other architectures like AArch64, we should impose restrictions
on the "*th_extu<mode>4" pattern.
gcc/
* config/riscv/thead.md (*th_extu<mode>4): Fix th.extu
operands exceeding range on rv32.
gcc/testsuite/
* gcc.target/riscv/xtheadbb-extu-4.c: New.
The RISC-V vector machine description relies on the helper function
`sew64_scalar_helper` to emit actual insns for the DI variants of
vssub.vx and vssubu.vx. This works with vssub.vx, but can cause
problems with vssubu.vx with the scalar operand being constant zero,
because `has_vi_variant_p` returns false, and the operand will be taken
without being loaded into a reg. The attached testcases can cause an
internal compiler error as a result.
Allowing a constant zero operand in those insns seems to be a simple
solution that only affects minimum existing code.
gcc/ChangeLog:
* config/riscv/vector.md: Allow zero operand for DI variants of
vssubu.vx
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/vssubu-1.c: New test.
* gcc.target/riscv/rvv/base/vssubu-2.c: New test.
The -Wdangling-reference diagnostic talks about the full-expression, but
prints one call, while the full-expression in a declaration is the entire
initialization. It seems more useful to point out the temporary that the
compiler thinks we might be getting a dangling reference to.
gcc/cp/ChangeLog:
* call.cc (do_warn_dangling_reference): Return temporary
instead of the call it's passed to.
(maybe_warn_dangling_reference): Adjust diagnostic.
gcc/testsuite/ChangeLog:
* g++.dg/warn/Wdangling-reference1.C: Adjust diagnostic.
We can't have a dangling reference to an empty class unless it's
specifically to that class or one of its bases. This was giving a
false positive on the _ExtractKey pattern in libstdc++ hashtable.h.
This also adjusts the order of arguments to reference_related_p, which
is relevant for empty classes (unlike scalars).
Several of the classes in the testsuite needed to gain data members to
continue to warn.
PR c++/115361
gcc/cp/ChangeLog:
* call.cc (do_warn_dangling_reference): Check is_empty_class.
gcc/testsuite/ChangeLog:
* g++.dg/ext/attr-no-dangling6.C
* g++.dg/ext/attr-no-dangling7.C
* g++.dg/ext/attr-no-dangling8.C
* g++.dg/ext/attr-no-dangling9.C
* g++.dg/warn/Wdangling-reference1.C
* g++.dg/warn/Wdangling-reference2.C
* g++.dg/warn/Wdangling-reference3.C: Make classes non-empty.
* g++.dg/warn/Wdangling-reference23.C: New test.
In the pattern X - (X / Y) * Y to X % Y, this patch guards the
simplification for vector types by a check for:
1) Support of the mod optab for vectors OR
2) Application before vector lowering for non-VL vectors.
This is to prevent reverting vectorization of modulo to div/mult/sub
if the target does not support vector mod optab.
The patch was bootstrapped and tested with no regression on
aarch64-linux-gnu and x86_64-linux-gnu.
OK for mainline?
Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/
PR tree-optimization/116569
* match.pd: Guard simplification to trunc_mod with check for
mod optab support.
gcc/testsuite/
PR tree-optimization/116569
* gcc.dg/torture/pr116569.c: New test.
The new macro is required because reload and LRA are using different
representations for a multi-register frame pointer. As ELIMINABLE_REGS
is used to initialize static const objects, it can't depend on -mlra.
PR rtl-optimization/116326
gcc/
* reload1.cc (reg_eliminate_1): Initialize from
RELOAD_ELIMINABLE_REGS if defined.
* config/avr/avr.h (RELOAD_ELIMINABLE_REGS): Copy from ELIMINABLE_REGS.
(ELIMINABLE_REGS): Don't mention sub-regnos of the frame pointer.
* doc/tm.texi.in (Eliminating Frame Pointer and Arg Pointer)
<RELOAD_ELIMINABLE_REGS>: Add documentation.
* doc/tm.texi: Rebuild.
gcc/testsuite/
* gcc.target/avr/torture/lra-pr116324.c: New test.
* gcc.target/avr/torture/lra-pr116325.c: New test.
gcc/
* doc/install.texi (Host/Target specific installation notes for GCC)
[avr]: Update web links to AVR-LibC and AVR Options.
Remove outdated note about Binutils.
split_constant_offset when looking through SSA defs can end up
picking SSA leafs that are subject to abnormal coalescing. This
can lead to downstream consumers to insert code based on the
result (like from dataref analysis) in places that violate constraints
for abnormal coalescing. It's best to not expand defs whose operands
are subject to abnormal coalescing - and not either do something when
a subexpression has operands like that already.
PR tree-optimization/116585
* tree-data-ref.cc (split_constant_offset_1): When either
operand is subject to abnormal coalescing do no further
processing.
* gcc.dg/torture/pr116585.c: New testcase.
This C++ify cond_if_else_store_replacement by using range fors
and changing using a std::pair instead of 2 vecs.
I had a hard time understanding the code when there was 2 vecs
so having a vec of a pair makes it easier to understand the relationship
between the 2.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (cond_if_else_store_replacement): Use
range fors and use one vec for then/else stores instead of 2.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
While trying to debug PR 116747, I noticed there was no dump
saying what was done. So this adds the debug dump and it helps
debug what is going on in PR 116747 too.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (cond_if_else_store_replacement_1): Add debug dump.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
This patch would like to implement the ssadd for vector integer. Aka
form 1 of ssadd vector.
Form 1:
#define DEF_VEC_SAT_S_ADD_FMT_1(T, UT, MIN, MAX) \
void __attribute__((noinline)) \
vec_sat_s_add_##T##_fmt_1 (T *out, T *op_1, T *op_2, unsigned limit) \
{ \
unsigned i; \
for (i = 0; i < limit; i++) \
{ \
T x = op_1[i]; \
T y = op_2[i]; \
T sum = (UT)x + (UT)y; \
out[i] = (x ^ y) < 0 \
? sum \
: (sum ^ x) >= 0 \
? sum \
: x < 0 ? MIN : MAX; \
} \
}
DEF_VEC_SAT_S_ADD_FMT_1(int64_t, uint64_t, INT64_MIN, INT64_MAX)
Before this patch:
vec_sat_s_add_int64_t_fmt_1:
...
vsetvli t1,zero,e64,m1,ta,mu
vadd.vv v3,v1,v2
vxor.vv v0,v1,v3
vmslt.vi v0,v0,0
vxor.vv v2,v1,v2
vmsge.vi v2,v2,0
vmand.mm v0,v0,v2
vsra.vx v1,v1,t3
vxor.vv v3,v1,v4,v0.t
...
After this patch:
vec_sat_s_add_int64_t_fmt_1:
...
vsetvli a6,zero,e64,m1,ta,ma
vsadd.vv v1,v1,v2
...
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
gcc/ChangeLog:
* config/riscv/autovec.md (ssadd<mode>3): Add new pattern for
signed integer vector SAT_ADD.
* config/riscv/riscv-protos.h (expand_vec_ssadd): Add new func
decl for vector ssadd expanding.
* config/riscv/riscv-v.cc (expand_vec_ssadd): Add new func impl
to expand vector ssadd pattern.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/binop/vec_sat_data.h: Add test
data for vector ssadd.
* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper
macros.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-2.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-3.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-4.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-3.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-4.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
This patch fixes PR target/89213 to allow better code to be generated to do
constant shifts of V2DI/V2DF vectors. Previously GCC would do constant shifts
of vectors with 64-bit elements by using:
XXSPLTIB 32,4
VEXTSB2D 0,0
VSRAD 2,2,0
I.e., the PowerPC does not have a VSPLTISD instruction to load -15..14 for the
64-bit shift count in one instruction. Instead, it would need to load a byte
and then convert it to 64-bit.
With this patch, GCC now realizes that the vector shift instructions will look
at the bottom 6 bits for the shift count, and it can use either a VSPLTISW or
XXSPLTIB instruction to load the shift count.
2024-09-17 Michael Meissner <meissner@linux.ibm.com>
gcc/
PR target/89213
* config/rs6000/altivec.md (UNSPEC_VECTOR_SHIFT): New unspec.
(VSHIFT_MODE): New mode iterator.
(vshift_code): New code iterator.
(vshift_attr): New code attribute.
(altivec_<mode>_<vshift_attr>_const): New pattern to optimize
vector long long/int shifts by a constant.
(altivec_<mode>_shift_const): New helper insn to load up a
constant used by the shift operation.
* config/rs6000/predicates.md (vector_shift_constant): New
predicate.
gcc/testsuite/
PR target/89213
* gcc.target/powerpc/pr89213.c: New test.
* gcc.target/powerpc/vec-rlmi-rlnm.c: Update instruction count.
The result of build_fold_indirect_ref can be a COMPONENT_REF in
which case using DECL_SOURCE_LOCATION will crash. Look at its op1
instead.
PR c++/116741
gcc/cp/ChangeLog:
* constexpr.cc (cxx_eval_constant_expression) <case CONVERT_EXPR>: If
the result of build_fold_indirect_ref is a COMPONENT_REF, use its op1.
Check DECL_P before calling inform.
gcc/testsuite/ChangeLog:
* g++.dg/cpp26/constexpr-voidptr4.C: New test.
Reviewed-by: Jason Merrill <jason@redhat.com>
Pre r14-4793, we'd call warn_tautological_cmp -> operand_equal_p
with operands wrapped in NON_DEPENDENT_EXPR, which works, since
o_e_p bails for codes it doesn't know. But now we pass operands
not encapsulated in NON_DEPENDENT_EXPR, and crash, because the
template tree for &a[x] has null DECL_FIELD_OFFSET.
This patch extends r12-7797 to cover the case when DECL_FIELD_OFFSET
is null.
PR c++/116534
gcc/ChangeLog:
* fold-const.cc (operand_compare::operand_equal_p): If either
field's DECL_FIELD_OFFSET is null, compare the fields with ==.
gcc/testsuite/ChangeLog:
* g++.dg/warn/Wtautological-compare4.C: New test.
Reviewed-by: Jason Merrill <jason@redhat.com>
r12-3495 added maybe_warn_about_constant_value which will crash if
it gets a nameless VAR_DECL, which is what happens in this PR.
We created this VAR_DECL in cp_parser_decomposition_declaration.
PR c++/116676
gcc/cp/ChangeLog:
* constexpr.cc (maybe_warn_about_constant_value): Check DECL_NAME.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1z/constexpr-116676.C: New test.
Reviewed-by: Jason Merrill <jason@redhat.com>
This patch folds svdiv where one of the operands is all-zeros to a zero
vector, if one of the following conditions holds:
- the dividend is all zeros or
- the divisor is all zeros, and the predicate is ptrue or the predication
is _x or _z.
This case was not covered by the recent patch that implemented constant
folding, because that covered only cases where both operands are
constant vectors. Here, the operation is folded as soon as one of the operands
is a constant zero vector.
Folding of divison by 0 to return 0 is in accordance with
the semantics of sdiv and udiv.
The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?
Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/
* config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold):
Add folding of all-zero operands to zero vector.
gcc/testsuite/
* gcc.target/aarch64/sve/fold_div_zero.c: New test.
* gcc.target/aarch64/sve/const_fold_div_1.c: Adjust expected
outcome.
SVE's INDEX instruction can be used to populate vectors by values starting from
"base" and incremented by "step" for each subsequent value. We can take
advantage of it to generate vector constants if TARGET_SVE is available and the
base and step values are within [-16, 15].
For example, with the following function:
typedef int v4si __attribute__ ((vector_size (16)));
v4si
f_v4si (void)
{
return (v4si){ 0, 1, 2, 3 };
}
GCC currently generates:
f_v4si:
adrp x0, .LC4
ldr q0, [x0, #:lo12:.LC4]
ret
.LC4:
.word 0
.word 1
.word 2
.word 3
With this patch, we generate an INDEX instruction instead if TARGET_SVE is
available.
f_v4si:
index z0.s, #0, #1
ret
PR target/113328
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_simd_valid_immediate): Improve
handling of some ADVSIMD vectors by using SVE's INDEX if TARGET_SVE is
available.
(aarch64_output_simd_mov_immediate): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/acle/general/dupq_1.c: Update test to use
SVE's INDEX instruction.
* gcc.target/aarch64/sve/acle/general/dupq_2.c: Likewise.
* gcc.target/aarch64/sve/acle/general/dupq_3.c: Likewise.
* gcc.target/aarch64/sve/acle/general/dupq_4.c: Likewise.
* gcc.target/aarch64/sve/vec_init_3.c: New test.
Signed-off-by: Pengxuan Zheng <quic_pzheng@quicinc.com>
This patch fixes the include directories used when building objects in
gm2-compiler-boot. It adds the missing gm2-gcc directory and uses a
new variable GM2_BOOT_INCLUDES for all gm2-compiler-boot rules.
gcc/m2/ChangeLog:
* Make-lang.in (GM2_BOOT_INCLUDES): New variable.
(m2/gm2-compiler-boot/M2GCCDeclare.o): Rewrite to use
GM2_BOOT_INCLUDES.
(m2/gm2-compiler-boot/M2Error.o): Ditto.
(m2/gm2-compiler-boot/%.o): Ditto.
Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
On Neoverse V2, SVE ADD instructions have a throughput of 4, while shift
instructions like SHL have a throughput of 2. We can lean on that to emit code
like:
add z31.b, z31.b, z31.b
instead of:
lsl z31.b, z31.b, #1
The implementation of this change for SVE vectors is similar to a prior patch
<https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659958.html> that adds
the above functionality for Neon vectors.
Here, the machine descriptor pattern is split up to separately accommodate left
and right shifts, so we can specifically emit an add for all left shifts by 1.
The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?
Signed-off-by: Soumya AR <soumyaa@nvidia.com>
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md (*post_ra_v<optab><mode>3): Split pattern
to accomodate left and right shifts separately.
(*post_ra_v_ashl<mode>3): Matches left shifts with additional
constraint to check for shifts by 1.
(*post_ra_v_<optab><mode>3): Matches right shifts.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/acle/asm/lsl_s16.c: Updated instances of lsl-1
with corresponding add.
* gcc.target/aarch64/sve/acle/asm/lsl_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_u8.c: Likewise.
* gcc.target/aarch64/sve/adr_1.c: Likewise.
* gcc.target/aarch64/sve/adr_6.c: Likewise.
* gcc.target/aarch64/sve/cond_mla_7.c: Likewise.
* gcc.target/aarch64/sve/cond_mla_8.c: Likewise.
* gcc.target/aarch64/sve/shift_2.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/rshl_s16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/rshl_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/rshl_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/rshl_s8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/rshl_u16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/rshl_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/rshl_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/rshl_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u64.c: Likewise.
* gcc.target/aarch64/sve/sve_shl_add.c: New test.
The >= and < comparisons may skip comparing the lower bytes when
the according bytes of the constant are all zeros. For example,
uint16 >= 0x1200
is true iff
hi8 (uint16) >= hi8 (0x1200)
and similar for uint16 < 0x1200. Some comparisons against constants
that are an integral power of 256 where already handled in the split
preparation. That code has been outsourced to new avr_maybe_cmp_lsr()
which may change the operands such that the resulting insns become
a comparison of the high bytes against 0 plus a EQ / NE branch.
For example,
uint32 >= 0x10000
can be rewritten as
(uint32 >> 16) != 0.
The according asm output is performed by new avr_out_cmp_lsr().
gcc/
* config/avr/avr-protos.h (avr_out_cmp_lsr, avr_maybe_cmp_lsr): New.
* config/avr/avr.cc (avr_maybe_cmp_lsr, avr_out_cmp_lsr): New functions.
(avr_out_compare) [GEU, LTU]: Start output at byte CTZ(xval) / 8.
(avr_adjust_insn_length) [ADJUST_LEN_CMP_LSR]: Handle case.
* config/avr/avr.md (adjust_len) <cmp_lsr>: New attr value.
(*cmp<mode>_lsr): New define_insn_and_split.
(cbranch<mode>4_insn): When splitting, run avr_maybe_cmp_lsr()
which may map the operands to *cmp<mode>_lsr.
gcc/testsuite/
* gcc.target/avr/torture/cmp-lsr-i32.c: New test.
* gcc.target/avr/torture/cmp-lsr-u16.c: New test.
* gcc.target/avr/torture/cmp-lsr-u24.c: New test.
* gcc.target/avr/torture/cmp-lsr-u32.c: New test.
* gcc.target/avr/torture/cmp-lsr-u64.c: New test.
Use %= instead of maintaining a sequence number manually, so that it
doesn't result in a duplicate assembler label when the insn is duplicated.
PR target/116693
* config/riscv/riscv.cc (riscv_legitimize_tls_address): Don't pass
seqno to gen_tlsdesc and remove it.
* config/riscv/riscv.md (@tlsdesc<mode>): Remove operand 1. Use
%= instead of %1 in template.
These config files set default formatting behaviour for a large number
of common editors, see https://editorconfig.org
The root=true setting in libstdc++-v3/.editorconfig prevents looking in
parent directories for additional settings. If we add a .editorconfig at
the top-level we might want to use root=true there instead, and allow
libstdc++-v3/.editorconfig to inherit some some settings from there (and
only override things we want to do differently).
libstdc++-v3/ChangeLog:
* .editorconfig: New file.
* include/std/.editorconfig: New file.
While adding simple_dce_worklist to the vectorizer, there was a regression
due to the slp patterns would create a SSA name but never free it even if it
never existed in the IR (this case as addsub but complex ones had the same issue).
The reason why it was never freed was the stmt_vec_info was not marked as a pattern stmt,
unlike the other pattern stmts that use vect_init_pattern_stmt instead of vec_info::add_pattern_stmt
(which is used for SLP patterns).
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-vectorizer.cc (vec_info::add_pattern_stmt): Set pattern_stmt_p.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Use "rtx_code" for RTX codes, not "enum rtx_code" and not "RTX_CODE".
Drop enum and struct tags if possible.
gcc/
* config/avr/avr.cc: Use rtx_code for RTX codes.
Drop enum and struct tags.
* config/avr/avr.md: Same.
* config/avr/avr-c.cc: Same.
* config/avr/avr-dimode.md: Same.
* config/avr/avr-passes.cc: Same.
* config/avr/avr-protos.h: Same.
ADIW doesn't mix with CPC / SBIC because it's not only about
propagating the Z flag but also about carry.
gcc/
* config/avr/avr.cc (avr_out_compare): Don't mix ADIW with SBCI / CPC.
Remove the special handling of end of nested scalarization chains, which
advanced the chain to an element of a parent chain when the current one
was reaching its end.
That handling was superfluous as nested chains correspond to nested
scalarizations of subexpressions and the scalarizations don't extend beyond
their associated subexpression and don't use any scalarisation element from
the parent expression.
No change of behaviour, as the GFC_SE struct is supposed to be in its final
state anyway when the last element from the chain has been consumed.
gcc/fortran/ChangeLog:
* trans-expr.cc (gfc_advance_se_ss_chain): Don't use an element
from the parent scalarization chain when the current chain reaches
its end.
When we're explicitly choosing GCC extensions, we similarly shouldn't
complain about optional features that GCC provides. This particular pattern
of cast between function and object pointer is used by gthr-posix.h on some
targets, including linux-gnu before glibc 2.34.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_unary_expression) [RID_EXTENSION]: Also
suppress -Wconditionally-supported.
gcc/testsuite/ChangeLog:
* g++.dg/warn/Wconditionally-supported-1.C: Add __extension__ cases.
It seems more useful for a conversion to have the location of the source
expression rather than the enclosing expression, such as a call that might
convert multiple arguments in different ways.
As a result, in srcloc17.C the recorded location of 'e' when
copy-initialized became that of the initializer rather than the variable,
since the semantic was to convert the initializer (at its location) and then
initialize the variable from the resulting prvalue. If we instead
direct-initialize the variable, the location of the constructor call is that
of the variable.
gcc/cp/ChangeLog:
* call.cc (convert_like_internal) [ck_user]: Use iloc_sentinel.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/srcloc17.C: Adjust initialization.