When we decompose a complex load only used as real and imaginary
parts we fail to honor IL constraints which are that a BIT_FIELD_REF
of register type should be outermost in a ref. The following
simply avoids the transform when the complex load has such a
BIT_FIELD_REF.
PR tree-optimization/117417
* tree-ssa-forwprop.cc (pass_forwprop::execute): Avoid
decomposing BIT_FIELD_REF complex load.
* gcc.dg/torture/pr117417.c: New testcase.
When optimizing for NOPs in case of overlapping regs in VEC_SELECT expressions,
validate subreg data before using simplify_subreg_regno. There is no real
SUBREG rtx here, but a pseudo subreg call to check if subregs are possible.
gcc/ChangeLog:
* rtlanal.cc (set_noop_p): Validate subreg constraints before checking
for overlapping regs using simplify_subreg_regno.
The following adds X86_TUNE_AVX512_TWO_EPILOGUES tuning and directs the
vectorizer to produce both a vector AVX2 and SSE epilogue for AVX512
vectorized loops when set. The tuning is enabled by default for Zen4
and Zen5 where I benchmarked it to be overall positive on SPEC CPU 2017 both
in performance and overall code size. In particular it speeds up
525.x264_r which with only an AVX2 epilogue ends up in unvectorized code
at the moment.
* config/i386/i386.cc (ix86_vector_costs::finish_cost): Set
m_suggested_epilogue_mode according to X86_TUNE_AVX512_TWO_EPILOGUES.
* config/i386/x86-tune.def (X86_TUNE_AVX512_TWO_EPILOGUES): Add.
Enable for znver4 and znver5.
The following enables targets to suggest the vector mode to be used
preferably for the epilogue of a vectorized loop. The patch also
enables more than one vectorized epilogue in case the target suggests
a vector mode for the epilogue of a vector epilogue.
* tree-vectorizer.h (vector_costs::suggested_epilogue_mode): New.
(vector_costs::m_suggested_epilogue_mode): Likewise.
(vector_costs::vector_costs): Initialize m_suggested_epilogue_mode.
* tree-vect-loop.cc (vect_analyze_loop): Honor the target
suggested prefered epilogue mode and support vector epilogues
of vector epilogues if requested.
When we do SLP discovery of a .MASK_LOAD for a dataref group with gaps
the discovery for the mask will have gaps as well and this was
unexpected in a few places. The following re-organizes things
slightly to accomodate for this.
PR tree-optimization/117484
* tree-vect-slp.cc (vect_build_slp_tree_2): Handle gaps in
mask discovery. Fix condition to release the load permutation.
(vect_lower_load_permutations): Assert we get no load
permutation for the unpermuted node.
* tree-vect-slp-patterns.cc (linear_loads_p): Properly identify
loads (without permutation).
(compatible_complex_nodes_p): Likewise.
* gcc.dg/vect/pr117484-1.c: New testcase.
* gcc.dg/vect/pr117484-2.c: Likewise.
The following treats both the same when considering to use gather or
scatter for single-element interleaving accesses.
This will cause
FAIL: gcc.target/aarch64/sve/sve_iters_low_2.c scan-tree-dump-not vect "LOOP VECTORIZED"
where we now vectorize the loop with VNx4QI, I'll leave it to ARM folks
to investigate whether that's OK and to adjust the testcase or to see
where to adjust things to make the testcase not vectorized again. The
original fix for which the testcase was introduced is still efffective.
PR tree-optimization/117502
* tree-vect-stmts.cc (get_group_load_store_type): Also consider
VMAT_STRIDED_SLP when checking to use gather/scatter for
single-element interleaving access.
* tree-vect-loop.cc (update_epilogue_loop_vinfo): STMT_VINFO_STRIDED_P
can be classified as VMAT_GATHER_SCATTER, so update DR_REF for
those as well.
This patch implements transformations for the following optimizations.
logN(x) CMP CST -> x CMP expN(CST)
expN(x) CMP CST -> x CMP logN(CST)
Where CMP expands to ge and le operations.
For example:
int
foo (float x)
{
return __builtin_logf (x) <= 0.0f;
}
can just be:
int
foo (float x)
{
return x <= 1.0f;
}
The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?
Signed-off-by: Soumya AR <soumyaa@nvidia.com>
gcc/ChangeLog:
* match.pd: Fold logN(x) CMP CST -> x CMP expN(CST)
and expN(x) CMP CST -> x CMP logN(CST)
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/log_exp.c: New test.
The C++ front-end uses symbols from these directories, so they should also
be in TAGS.
gcc/cp/ChangeLog:
* Make-lang.in: Also collect tags from libcody and c++tools.
The C++ modules support is not targeting the Modules TS, so it doesn't make
much sense to refer to the TS in the option name. But keep the old spelling
as an undocumented alias for now.
gcc/ChangeLog:
* doc/invoke.texi: Rename -fmodules-ts to -fmodules.
gcc/c-family/ChangeLog:
* c.opt: Add -fmodules with same effect as -fmodules-ts.
gcc/cp/ChangeLog:
* lang-specs.h: Check fmodules* instead of fmodules-ts.
The init-list initialization of cl_deferred_option p had a couple of
narrowing warnings: first of opt_index from int to size_t and then of value
from HOST_WIDE_INT to int. Fixed by making the types more consistent.
gcc/ChangeLog:
* opts.h (cl_deferred_option::value): Change to HOST_WIDE_INT.
(set_option): Change opt_index parm to size_t.
* opts-common.cc (set_option): Likewise.
Even though this PR is very close to PR117101, it's not addressed by the
fix I made through r15-4958-g5821f5c8c89a05 because cxx_placement_new_fn
has the very same issue as std_placement_new_fn_p used to have.
As suggested by Jason, this patch changes both functions so that
cxx_placement_new_fn leverages std_placement_new_fn_p which reduces code
duplication and fixes the PR.
PR c++/117463
gcc/cp/ChangeLog:
* constexpr.cc (cxx_placement_new_fn): Implement in terms of
std_placement_new_fn_p.
* cp-tree.h (std_placement_new_fn_p): Declare.
* init.cc (std_placement_new_fn_p): Add missing checks to ensure
that fndecl is a non-replaceable ::operator new.
gcc/testsuite/ChangeLog:
* g++.dg/init/new54.C: New test.
clang++ adds __builtin_operator_{new,delete} builtins which as documented
work similarly to ::operator {new,delete}, except that it is an error
if the called ::operator {new,delete} is not a replaceable global operator
and allow optimizations which C++ normally allows just when those are used
from new/delete expressions https://eel.is/c++draft/expr.new#14
When using these builtins, the same optimizations can be done even when
using those builtins.
For GCC we note that in the CALL_FROM_NEW_OR_DELETE_P flag on CALL_EXPRs.
The following patch implements it as a C++ FE keyword (because passing
references through ... changes the argument and so BUILT_IN_FRONTEND
builtin can't be used), just attempts to call the ::operator {new,delete}
and if it isn't replaceable, diagnoses it.
libstdc++ already uses the builtin in some cases.
2024-11-11 Jakub Jelinek <jakub@redhat.com>
gcc/c-family/
* c-common.h (enum rid): Add RID_BUILTIN_OPERATOR_NEW
and RID_BUILTIN_OPERATOR_DELETE.
(names_builtin_p): Change return type from bool to int.
* c-common.cc (c_common_reswords): Add __builtin_operator_new
and __builtin_operator_delete.
gcc/c/
* c-decl.cc (names_builtin_p): Change return type from
bool to int, adjust return statments.
gcc/cp/
* parser.cc (cp_parser_postfix_expression): Handle
RID_BUILTIN_OPERATOR_NEW and RID_BUILTIN_OPERATOR_DELETE.
* cp-objcp-common.cc (names_builtin_p): Change return type from
bool to int, adjust return statments. Handle
RID_BUILTIN_OPERATOR_NEW and RID_BUILTIN_OPERATOR_DELETE.
* pt.cc (tsubst_expr) <case CALL_EXPR>: Handle
CALL_FROM_NEW_OR_DELETE_P.
gcc/
* doc/extend.texi (New/Delete Builtins): Document
__builtin_operator_new and __builtin_operator_delete.
gcc/testsuite/
* g++.dg/ext/builtin-operator-new-1.C: New test.
* g++.dg/ext/builtin-operator-new-2.C: New test.
* g++.dg/ext/builtin-operator-new-3.C: New test.
The std::logic_error exceptions thrown from misuses of
std::wbuffer_convert and std::wstring_convert should use names qualified
with "std::".
libstdc++-v3/ChangeLog:
* include/bits/locale_conv.h (wstring_convert, wbuffer_convert):
Adjust strings passed to exception constructors.
The intended behaviour for std::text_encoding::aliases_view's iterator
is that it incrementing or decrementing too far sets it to a
value-initialized state, or fails an assertion when those are enabled.
There were typos that used == instead of = which meant that instead of
becoming singular or aborting, an out-of-range increment just did
nothing. This meant erroneous operations were well-defined and didn't
produce any undefined behaviour, but were not diagnosed with assertions
enabled, as had been intended.
This change fixes the bugs and adds more tests to verify the intended
behaviour.
libstdc++-v3/ChangeLog:
PR libstdc++/117520
* include/std/text_encoding (aliases_view:_Iterator::operator+=):
Fix typos that caused == to be used instead of =.
(aliases_view::_Iterator): Fix friend declaration.
* testsuite/std/text_encoding/members.cc: Adjust expected
behaviour of invalid subscript. Add tests for other erroneous
operations on iterators.
Since some of the c2y-if-decls tests use _Atomic, add a
requirement for target to support atomic operations on
int and long types.
This fixes spurious test link failures on pru-unknown-elf,
which lacks atomic ops. The tests still pass on x86_64-linux-gnu.
gcc/testsuite/ChangeLog:
* gcc.dg/c2y-if-decls-1.c: Require target that supports atomic
operations on int and long types.
* gcc.dg/c2y-if-decls-11.c: Ditto.
* gcc.dg/c2y-if-decls-4.c: Ditto.
* gcc.dg/c2y-if-decls-8.c: Ditto.
Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
With the change in 15-3128-gde1923f9f4d, this test case no longer xfail.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/complex/fast-math-complex-add-half-float.c: Remove
xfail from test.
Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
According to the aapcs64: If the argument is an 8-bit (...) precision
Floating-point or short vector type and the NSRN is less than 8, then the
argument is allocated to the least significant bits of register v[NSRN].
gcc/
* config/aarch64/aarch64.cc
(aarch64_vfp_is_call_or_return_candidate): use fp registers to
return svmfloat8_t parameters.
gcc/testsuite/
* gcc.target/aarch64/fp8_scalar_1.c:
Lewis' r15-5067 fixing the marking of TRAIT_EXPR led me to compare some
other front-end type definitions to their marking in cp_common_init_ts; it
seems we can change tree_common to something smaller in several cases, to
match how they are marked.
gcc/cp/ChangeLog:
* cp-tree.h (struct ptrmem_cst): Change tree_common to tree_typed.
(struct tree_trait_expr): Likewise.
(struct tree_static_assert): Change tree_common to tree_base.
(struct tree_argument_pack_select): Likewise.
On my system with E and P cores (hybrid) x86, the spincount is by default 1
and not 300000, cf. PR109812 and r14-4571-ge1e127de18dbee.
Hence, this commit updates the expected value of the testcase to also
accept omp_display_env showing "GOMP_SPINCOUNT = '1'" - but only for
x86-64, which might be hybrid.
libgomp/ChangeLog:
* testsuite/libgomp.c-c++-common/pr109062.c: Update dg-output
to also accept GOMP_SPINCOUNT = 1 for x86-64.
This was responsible for a bunch of SVE FAILs with --param vect-force-slp=1
* tree-vect-slp.cc (arg1_arg3_map): New.
(arg1_arg3_arg4_map): Likewise.
(vect_get_operand_map): Handle IFN_SCATTER_STORE,
IFN_MASK_SCATTER_STORE and IFN_MASK_LEN_SCATTER_STORE.
(vect_build_slp_tree_1): Likewise.
* tree-vect-stmts.cc (vectorizable_store): For SLP masked
gather/scatter record the mask with proper number of copies.
* tree-vect-loop.cc (vectorizable_recurr): Avoid costing
the initial value construction in the prologue twice with SLP.
Previous patches are supposed to add full support for SVE2.1,
so this patch advertises that through __ARM_FEATURE_SVE2p1.
pragma_cpp_predefs_3.c had one fewer pop than push. The final
test is triple-nested:
- armv8-a (to start with a clean slate, untainted by command-line flags)
- the maximal SVE set
- general-regs-only
gcc/
* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Handle
__ARM_FEATURE_SVE2p1.
gcc/testsuite/
* gcc.target/aarch64/pragma_cpp_predefs_3.c: Add SVE2p1 tests.
This patch moves the scalar and single-vector Advanced SIMD types
from arm_neon.h into a private header, so that they can be defined
by arm_sve.h as well. This is needed for the upcoming SVE2.1
hybrid-VLA reductions, which return 128-bit Advanced SIMD vectors.
The approach follows Claudio's patch for FP8.
gcc/
* config.gcc (extra_headers): Add arm_private_neon_types.h.
* config/aarch64/arm_private_neon_types.h: New file, split out
from...
* config/aarch64/arm_neon.h: ...here.
* config/aarch64/arm_sve.h: Include arm_private_neon_types.h
This patch adds an svboolx4_t type, to go alongside the existing
svboolx2_t type. It doesn't require any special ISA support beyond
SVE itself and it currently has no associated instructions.
gcc/
* config/aarch64/aarch64-modes.def (VNx64BI): New mode.
* config/aarch64/aarch64-protos.h
(aarch64_split_double_move): Generalize to...
(aarch64_split_move): ...this.
* config/aarch64/aarch64-sve-builtins-base.def (svcreate4, svget4)
(svset4, svundef4): Add bool variants.
* config/aarch64/aarch64-sve-builtins.cc (handle_arm_sve_h): Add
svboolx4_t.
* config/aarch64/iterators.md (SVE_STRUCT_BI): New mode iterator.
* config/aarch64/aarch64-sve.md (movvnx32bi): Generalize to...
(mov<SVE_STRUCT_BI:mode>): ...this.
* config/aarch64/aarch64.cc
(pure_scalable_type_info::piece::get_rtx): Allow num_prs to be 4.
(aarch64_classify_vector_mode): Handle VNx64BI.
(aarch64_hard_regno_nregs): Likewise.
(aarch64_class_max_nregs): Likewise.
(aarch64_array_mode): Use VNx64BI for arrays of 4 svbool_ts.
(aarch64_split_double_move): Generalize to...
(aarch64_split_move): ...this.
(aarch64_split_128bit_move): Update call accordingly.
gcc/testsuite/
* gcc.target/aarch64/sve/acle/general-c/create_5.c: Expect svcreate4
to succeed for svbool_ts.
* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h
(TEST_UNDEF_B): New macro.
* gcc.target/aarch64/sve/acle/asm/create4_1.c: Test _b form.
* gcc.target/aarch64/sve/acle/asm/undef2_1.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/undef4_1.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/get4_b.c: New test.
* gcc.target/aarch64/sve/acle/asm/set4_b.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/svboolx4_1.c: Likewise.
This patch factors out some of ext_def into a base class,
so that it can be reused for the SVE2.1 svextq intrinsic.
gcc/
* config/aarch64/aarch64-sve-builtins-shapes.cc (ext_base): New base
class, extracted from...
(ext_def): ...here.
All extending gather load intrinsics encode the source type in
their name (e.g. svld1sb for an extending load from signed bytes).
The type of the extension result has to be specified using an
explicit type suffix; it isn't something that can be inferred
from the arguments, since there are multiple valid choices for
the same arguments.
This meant that type inference for gather loads was only needed for
non-extending loads, in which case the pointer target had to be a
32-bit or 64-bit element type. The gather_scatter_p argument to
function_resolver::infer_pointer_type therefore controlled two things:
how we should react to vector base addresses, and whether we should
require a minimum element size of 32.
The element size restriction doesn't apply to the upcomding SVE2.1
svld1q intrinsic, so this patch adds a separate argument for the minimum
element size requirement.
gcc/
* config/aarch64/aarch64-sve-builtins.h
(function_resolver::target_type_restrictions): New enum.
(function_resolver::infer_pointer_type): Add an extra argument
that specifies what the target type can be.
* config/aarch64/aarch64-sve-builtins.cc
(function_resolver::infer_pointer_type): Likewise.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(load_gather_sv_base::get_target_type_restrictions): New virtual
member function.
(load_gather_sv_base::resolve): Use it. Update call to
infer_pointer_type.
Until now, all data arguments to a scatter store needed to have
32-bit or 64-bit elements. This isn't true for the upcoming SVE2.1
svst1q scatter intrinsic, so this patch adds an abstraction around the
restriction.
gcc/
* config/aarch64/aarch64-sve-builtins-shapes.cc
(store_scatter_base::infer_vector_type): New virtual member function.
(store_scatter_base::resolve): Use it.
In the upcoming SVE2.1 svld1q and svst1q intrinsics, the relationship
between the base vector and the data vector differs from existing
gather/scatter intrinsics. This patch adds a new abstraction to
handle the difference.
gcc/
* config/aarch64/aarch64-sve-builtins.h
(function_shape::vector_base_type): New member function.
* config/aarch64/aarch64-sve-builtins.cc
(function_shape::vector_base_type): Likewise.
(function_resolver::resolve_sv_displacement): Use it.
(function_resolver::resolve_gather_address): Likewise.
GCC previously used the older assembly syntax for SVE TBL, with no
braces around the second operand. This patch switches to the newer,
official syntax, with braces around the operand.
The initial SVE binutils submission supported both syntaxes, so there
should be no issues with backwards compatibility.
gcc/
* config/aarch64/aarch64-sve.md (@aarch64_sve_tbl<mode>): Wrap
the second operand in braces.
gcc/testsuite/
* gcc.target/aarch64/sve/acle/asm/dup_lane_bf16.c: Wrap the second
TBL operand in braces
* gcc.target/aarch64/sve/acle/asm/dup_lane_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_lane_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_lane_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_lane_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_lane_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_lane_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_lane_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_lane_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_lane_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_lane_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_lane_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/tbl_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/tbl_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/tbl_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/tbl_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/tbl_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/tbl_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/tbl_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/tbl_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/tbl_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/tbl_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/tbl_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/tbl_u8.c: Likewise.
* gcc.target/aarch64/sve/slp_perm_6.c: Likewise.
* gcc.target/aarch64/sve/slp_perm_7.c: Likewise.
* gcc.target/aarch64/sve/vec_perm_1.c: Likewise.
* gcc.target/aarch64/sve/vec_perm_const_1.c: Likewise.
* gcc.target/aarch64/sve/vec_perm_const_1_overrun.c: Likewise.
* gcc.target/aarch64/sve/vec_perm_const_single_1.c: Likewise.
* gcc.target/aarch64/sve/vec_perm_single_1.c: Likewise.
* gcc.target/aarch64/sve/uzp1_1.c: Shorten the scan-assembler-nots
to just "\ttbl\".
* gcc.target/aarch64/sve/uzp2_1.c: Likewise.
Past extensions to SVE have required new subsets of all_data; the
SVE2.1 patches will add another. This patch tries to make this more
scalable by defining the multi-size *_data macros to be unions of
single-size *_data macros.
gcc/
* config/aarch64/aarch64-sve-builtins.cc (TYPES_all_data): Redefine
in terms of single-size *_data definitions.
(TYPES_bhs_data, TYPES_hs_data, TYPES_sd_data): Likewise.
(TYPES_b_data, TYPES_h_data, TYPES_s_data): New macros.
g:ede97598e2c recorded separate ISA requirements for streaming
and non-streaming mode. The premise there was that AARCH64_FL_SME
should not be included in the streaming mode requirements, since:
(a) an __arm_streaming_compatible function wouldn't be in streaming
mode if SME wasn't available.
(b) __arm_streaming_compatible functions only allow things that are
possible in non-streaming mode, so the non-streaming architecture
is enough to assemble the code, even if +sme isn't enabled.
(c) we reject __arm_streaming if +sme isn't enabled, so don't need
to test it for individual intrinsics as well.
Later patches lean into this further.
This patch applies the same reasoning to the .md constructs for
base streaming-only SME instructions, guarding them with
TARGET_STREAMING rather than TARGET_STREAMING_SME.
gcc/
* config/aarch64/aarch64.h (TARGET_SME): Expand comment.
(TARGET_STREAMING_SME): Delete.
* config/aarch64/aarch64-sme.md: Use TARGET_STREAMING instead of
TARGET_STREAMING_SME.
* config/aarch64/aarch64-sve2.md: Likewise.
Some code was checking TARGET_STREAMING and TARGET_SME2 separately,
but we now have a macro to test both at once.
gcc/
* config/aarch64/aarch64-sme.md: Use TARGET_STREAMING_SME2
instead of separate TARGET_STREAMING and TARGET_SME2 tests.
* config/aarch64/aarch64-sve2.md: Likewise.
* config/aarch64/iterators.md: Likewise.
For the aarch64 simd clones patches, it would be useful to be able to
push a function declaration onto the cfun stack, even though it has no
function body associated with it. That is, we want cfun to be null,
current_function_decl to be the decl itself, and the target and
optimisation flags to reflect the declaration.
This patch adds a push/pop_function_decl pair to do that.
I think the more direct way of doing what I want to do under the
existing interface would have been:
push_cfun (nullptr);
invoke_set_current_function_hook (fndecl);
pop_cfun ();
where invoke_set_current_function_hook would need to become public.
But it seemed safer to use the higher-level routines, since it makes
sure that the target/optimisation changes are synchronised with the
function changes. In particular, if cfun was null before the
sequence above, the pop_cfun would leave the flags unchanged,
rather than restore them to the state before the push_cfun.
gcc/
* function.h (push_function_decl, pop_function_decl): Declare.
* function.cc (set_function_decl): New function, extracted from...
(set_cfun): ...here.
(push_function_decl): New function, extracted from...
(push_cfun): ...here.
(pop_cfun_1): New function, extracted from...
(pop_cfun): ...here.
(pop_function_decl): New function.
2024-11-10 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/109345
* trans-array.cc (gfc_get_array_span): Unlimited polymorphic
expressions are now treated separately since the span need not
be the same as the element size.
gcc/testsuite/
PR fortran/109345
* gfortran.dg/character_workout_1.f90: Cut trailing whitespace.
* gfortran.dg/pr109345.f90: New test.
For the loop in the testcase we currently fail to hoist the guard
check of the inner loop (m > 0) out of the outer loop because
find_loop_guard checks all blocks of the outer loop for side-effects,
including those that are skipped by the guard. This usually
is harmless as the guard does not skip any blocks in the outer loop
but in this case store-motion was applied to the inner loop and thus
there's now a skipped store in the outer loop.
The following properly skips blocks that are dominated by the
entry to the skipped region.
PR tree-optimization/117510
* tree-ssa-loop-unswitch.cc (find_loop_guard): Only check
not skipped blocks for side-effects.
* gcc.dg/vect/vect-outer-pr117510.c: New testcase.
Just notice the indent is not that right for ustrunc pattern from
the md files. Thus, make it correct. It is somehow very obvious
and will commit it after next 48H if no more comments.
gcc/ChangeLog:
* config/riscv/autovec.md: Fix indent format issue.
Signed-off-by: Pan Li <pan2.li@intel.com>
2024-11-11 Tomas Trnka <trnka@scm.com>
Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/116388
* class.cc (finalize_component): Leading underscore in the name
of 'byte_stride' to suppress invalid finalization.
gcc/testsuite/
PR fortran/116388
* gfortran.dg/finalize_58.f90: New test.
gcc/fortran/ChangeLog:
* check.cc (gfc_check_complex): Reject UNSIGNED.
* gfortran.texi: Update example program. Note that
CMPLX, INT and REAL also take unsigned arguments.
* intrinsic.texi (CMPLX): Document UNSIGNED.
(INT): Likewise.
(REAL): Likewise.
gcc/testsuite/ChangeLog:
* gfortran.dg/unsigned_41.f90: New test.
Explain that 'bootstrap-ubsan' won't abort on errors by default and how
to override that by setting UBSAN_OPTIONS.
gcc/ChangeLog:
PR other/116948
* doc/install.texi (Building a native compiler): Document UBSAN_OPTIONS.
The second source register of insn "*extzvsi-1bit_addsubx" cannot be the
same as the destination register, because that register will be overwritten
with an intermediate value after insn splitting.
/* example #1 */
int test1(int b, int a) {
return ((a & 1024) ? 4 : 0) + b;
}
;; result #1 (incorrect)
test1:
extui a2, a3, 10, 1 ;; overwrites A2 before used
addx4 a2, a2, a2
ret.n
This patch fixes that.
;; result #1 (correct)
test1:
extui a3, a3, 10, 1 ;; uses A3 and then overwrites
addx4 a2, a3, a2
ret.n
However, it should be noted that the first source register can be the same
as the destination without any problems.
/* example #2 */
int test2(int a, int b) {
return ((a & 1024) ? 4 : 0) + b;
}
;; result (correct)
test2:
extui a2, a2, 10, 1 ;; uses A2 and then overwrites
addx4 a2, a2, a3
ret.n
gcc/ChangeLog:
* config/xtensa/xtensa.md (*extzvsi-1bit_addsubx):
Add '&' to the destination register constraint to indicate that
it is 'earlyclobber', append '0' to the first source register
constraint to indicate that it can be the same as the destination
register, and change the split condition from 1 to reload_completed
so that the insn will be split only after RA in order to obtain
allocated registers that satisfy the above constraints.