diagnostic_manager::emit_saved_diagnostic makes a useless clone
of global_dc->m_printer; remove it.
No functional change intended.
gcc/analyzer/ChangeLog:
PR other/116613
* diagnostic-manager.cc (diagnostic_manager::emit_saved_diagnostic):
Remove remove redundant 'pp'.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
PR 116529 shows that std::unique_ptr<X&, D> is currently unusable
because the constructor taking std::auto_ptr (which is a non-standard
extension since C++17) tries to form the invalid type X&* during
overload resolution. We can use the `pointer` type in the constructor
constraints, instead of trying to form an invalid type. The
std::auto_ptr constructor can never actually match for the case where
element_type is a reference, so we just need it to produce a
substitution failure instead of being ill-formed.
LWG 4144 might make std::unique_ptr<X&, D> ill-formed, which would
invalidate this new test. We would have to remove this test in that
case. Using `pointer` in the constructor from std::auto_ptr would not be
needed to support the std::unique_ptr<X&, D> case, but would not cause
any harm either.
libstdc++-v3/ChangeLog:
PR libstdc++/116529
* include/bits/unique_ptr.h (unique_ptr(auto_ptr<U>&&)):
Use pointer instead of T*.
* testsuite/20_util/unique_ptr/creation/116529.cc: New test.
There are several features that are not supported when using the old
std::string ABI. It's possible that PR 81967 will get fixed, but the
missing C++20 features almost certainly won't be. Document this in the
manual.
libstdc++-v3/ChangeLog:
PR libstdc++/116777
* doc/xml/manual/using.xml: Document features that are not
supported for the gcc4-compatible ABI.
* doc/html/manual/using_dual_abi.html: Regenerate.
When checking for compatibility of structure or union types in
tagged_types_tu_compatible_p, restore the old value of the pointer to
the top of the temporary cache after recursively calling comptypes_internal
when looping over the members of a structure of union. While the next
iteration of the loop overwrites the pointer, I missed the fact that it can
be accessed again when types of function arguments are compared as part
of recursive type checking and the function is entered again.
PR c/116726
gcc/c/ChangeLog:
* c-typeck.cc (tagged_types_tu_compatible_p): Restore value
of the cache after recursing into comptypes_internal.
gcc/testsuite/ChangeLog:
* gcc.dg/pr116726.c: New test.
As a follow-up to r15-3741-gee3efe06c9c49c, which was specifically
concerned with usings, it seems the CWG 2789 refinement should also
compare contexts of a reversed vs non-reversed (member) candidate
during operator overload resolution.
DR 2789
gcc/cp/ChangeLog:
* call.cc (cand_parms_match): Check for matching class contexts
even in the reversed case.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-memfun4.C: Adjust expected result
involving reversed candidate.
Reviewed-by: Jason Merrill <jason@redhat.com>
This patch removes unused parameters in gm2-compiler/M2Check.mod.
It also removes a --fixme-- and completes the missing code
which type checks unbounded arrays. The patch also fixes a
build error seen when building m2/stage2/cc1gm2.
gcc/m2/ChangeLog:
* gm2-compiler/M2Check.mod (checkUnboundedArray): New
procedure function.
(checkUnboundedUnbounded): Ditto.
(checkUnbounded): Rewrite to check the unbounded data
type.
(checkPair): Add comment.
(doCheckPair): Add comment.
Remove tinfo parameter from the call to checkTypeKindViolation.
(checkTypeKindViolation): Remove ununsed parameter tinfo.
* gm2-libs-ch/UnixArgs.cc (GM2RTS.h): Remove include.
* gm2-libs-ch/m2rts.h (M2RTS_INIT): New define.
(M2RTS_DEP): Ditto.
(M2RTS_RegisterModule): New prototype.
(GM2RTS.h): Add include to the MC_M2 block.
gcc/testsuite/ChangeLog:
* gm2/iso/fail/testarrayunbounded2.mod: New test.
* gm2/iso/fail/testarrayunbounded3.mod: New test.
* gm2/iso/fail/testarrayunbounded4.mod: New test.
* gm2/iso/fail/testarrayunbounded5.mod: New test.
* gm2/iso/fail/testarrayunbounded6.mod: New test.
* gm2/iso/pass/testarrayunbounded.mod: New test.
Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
After CWG 2789, the "more constrained" tiebreaker for non-template
functions should exclude member functions that are defined in
different classes. This patch implements this missing refinement.
In turn we can get rid of four-parameter version of object_parms_correspond
and call the main overload directly since now correspondence is only
only checked for members from the same class.
PR c++/116492
DR 2789
gcc/cp/ChangeLog:
* call.cc (object_parms_correspond): Remove.
(cand_parms_match): Return false for member functions that come
from different classes. Adjust call to object_parms_correspond.
(joust): Update comment for the non-template "more constrained"
case.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-memfun4.C: Also compile in C++20 mode.
Expect ambiguity when candidates come from different classes.
* g++.dg/cpp2a/concepts-inherit-ctor12.C: New test.
Reviewed-by: Jason Merrill <jason@redhat.com>
Our implementation of the CWG 2273 inheritedness tiebreaker seems to be
incorrectly considering all member functions introduced via using, not
just constructors. This patch restricts the tiebreaker accordingly.
DR 2273
gcc/cp/ChangeLog:
* call.cc (joust): Restrict inheritedness tiebreaker to
constructors.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1z/using1.C: Expect ambiguity for non-constructor call.
* g++.dg/overload/using5.C: Likewise.
Reviewed-by: Jason Merrill <jason@redhat.com>
This defines VECTOR_STORE_FLAG_VALUE to CONST1_RTX for AArch64
so we simplify vector comparisons in AArch64.
With this enabled
res:
movi v0.4s, 0
cmeq v0.4s, v0.4s, v0.4s
ret
is simplified to:
res:
mvni v0.4s, 0
ret
gcc/ChangeLog:
* config/aarch64/aarch64.h (VECTOR_STORE_FLAG_VALUE): New.
gcc/testsuite/ChangeLog:
* gcc.dg/rtl/aarch64/vector-eq.c: New test.
The testcase for this tests needs Neoverse V2 to be used
since due to costing the other cost models don't pick this
particular SVE mode.
committed as obvious.
Thanks,
Tamar
gcc/testsuite/ChangeLog:
PR tree-optimization/116628
* gcc.dg/vect/pr116628.c: Update cmdline.
Recent versions of Xcode as require a dash to read from standard
input. We can use this on all supported OS versions so make it
unconditional. Patch from Mark Mentovai.
gcc/ChangeLog:
* config/darwin.h (AS_NEEDS_DASH_FOR_PIPED_INPUT): New.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
Now that we have separated the codegen of the ramp, actor and
destroy functions, we no longer need to manage the scopes for
variables manually.
This introduces a helper function that allows us to build a
local var with a DECL_VALUE_EXPR that relates to the coroutine
state frame entry.
This fixes a latent issue where we would generate guard vars
when exceptions were disabled.
gcc/cp/ChangeLog:
* coroutines.cc (coro_build_artificial_var_with_dve): New.
(coro_build_and_push_artificial_var): New.
(coro_build_and_push_artificial_var_with_dve): New.
(analyze_fn_parms): Ensure that frame entries cannot clash
with local variables.
(build_coroutine_frame_delete_expr): Amend comment.
(cp_coroutine_transform::build_ramp_function): Rework to
avoid manual management of variables and scopes.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc.dg/vect/vect-pr111779.c is a case where non-SLP manages to vectorize
using VMAT_ELEMENTWISE but SLP currently refuses because doing a regular
access with permutes would cause excess vector loads with at most one
element used. The following makes us fall back to elementwise accesses
for that, too.
* tree-vect-stmts.cc (get_group_load_store_type): Fall back
to VMAT_ELEMENTWISE when single element interleaving of
a too large group.
(vectorizable_load): Do not try to verify load permutations
when using VMAT_ELEMENTWISE for single-lane SLP and fix code
generation for this case.
* gfortran.dg/vect/vect-8.f90: Allow one more vectorized loop.
gcc.dg/vect/vect-live-2.c shows it's important to handle live but
otherwise unused pattern stmts.
* tree-vect-slp.cc (vect_analyze_slp): Lookup patterns when
discovering from only-live roots.
Since the old reload pass is about to be removed and we defaulted to LRA
for over a decade, remove option -m{,no-}lra.
PR target/113953
gcc/ChangeLog:
* config/s390/s390.cc (s390_lra_p): Remove.
(TARGET_LRA_P): Remove.
* config/s390/s390.opt (mlra): Remove.
* config/s390/s390.opt.urls (mlra): Remove.
gcc/testsuite/ChangeLog:
* gcc.target/s390/TI-constants-nolra.c: Removed.
* gcc.target/s390/pr79895.c: Removed.
With SLP this token appears a lot, when looking for what gets code
generated instead look for " = VEC_PERM_EXPR"
PR testsuite/116397
* gcc.dg/vect/slp-reduc-3.c: Look for " = VEC_PERM_EXPR"
instead of "VEC_PERM_EXPR".
When a memory copy operation is analyzed by analyze_ssa_name, if both the
load and store are made through the same SSA name, the store is overlooked.
gcc/
* ipa-modref.cc (modref_eaf_analysis::analyze_ssa_name): Always
process both the load and the store of a memory copy operation.
gcc/testsuite/
* gcc.dg/ipa/modref-4.c: New test.
Those TR13/OpenMP 6.0 routines permit a reproducible offloading to
a specific device by mapping an OpenMP device number to a
unique ID (UID). The GPU device UIDs should be universally unique,
the one for the host is not.
gcc/ChangeLog:
* omp-general.cc (omp_runtime_api_procname): Add
get_device_from_uid and omp_get_uid_from_device routines.
include/ChangeLog:
* cuda/cuda.h (cuDeviceGetUuid): Declare.
(cuDeviceGetUuid_v2): Add prototype.
libgomp/ChangeLog:
* config/gcn/target.c (omp_get_uid_from_device,
omp_get_device_from_uid): Add stub implementation.
* config/nvptx/target.c (omp_get_uid_from_device,
omp_get_device_from_uid): Likewise.
* fortran.c (omp_get_uid_from_device_,
omp_get_uid_from_device_8_): New functions.
* libgomp-plugin.h (GOMP_OFFLOAD_get_uid): Add prototype.
* libgomp.h (struct gomp_device_descr): Add 'uid' and 'get_uid_func'.
* libgomp.map (GOMP_6.0): New, includind the new UID routines.
* libgomp.texi (OpenMP Technical Report 13): Mark UID routines as 'Y'.
(Device Information Routines): Document new UID routines.
(Offload-Target Specifics): Document UID format.
* omp.h.in (omp_get_device_from_uid, omp_get_uid_from_device):
New prototype.
* omp_lib.f90.in (omp_get_device_from_uid, omp_get_uid_from_device):
New interface.
* omp_lib.h.in: Likewise.
* plugin/cuda-lib.def: Add cuDeviceGetUuid and cuDeviceGetUuid_v2 via
CUDA_ONE_CALL_MAYBE_NULL.
* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_uid): New.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_uid): New.
* target.c (str_omp_initial_device): New static var.
(STR_OMP_DEV_PREFIX): Define.
(gomp_get_uid_for_device, omp_get_uid_from_device,
omp_get_device_from_uid): New.
(gomp_load_plugin_for_device): DLSYM_OPT the function 'get_uid'.
(gomp_target_init): Set the device's 'uid' field to NULL.
* testsuite/libgomp.c/device_uid.c: New test.
* testsuite/libgomp.fortran/device_uid.f90: New test.
min/max patterns for intrinsics which on x86 result in the second
input operand if the two operands are both zeros or one or both of them
are a NaN shouldn't use SMIN/SMAX RTL, because that is similarly to
MIN_EXPR/MAX_EXPR undefined what will be the result in those cases.
The following patch adds an expander which uses either a new pattern with
UNSPEC_IEEE_M{AX,IN} or use the S{MIN,MAX} representation of the same.
2024-09-20 Uros Bizjak <ubizjak@gmail.com>
Jakub Jelinek <jakub@redhat.com>
PR target/116738
* config/i386/subst.md (mask_scalar_operand_arg34,
mask_scalar_expand_op3, round_saeonly_scalar_mask_arg3): New
subst attributes.
* config/i386/sse.md
(<sse>_vm<code><mode>3<mask_scalar_name><round_saeonly_scalar_name>):
Change from define_insn to define_expand, rename the old define_insn
to ...
(*<sse>_vm<code><mode>3<mask_scalar_name><round_saeonly_scalar_name>):
... this.
(<sse>_ieee_vm<ieee_maxmin><mode>3<mask_scalar_name><round_saeonly_scalar_name>):
New define_insn.
* gcc.target/i386/sse-pr116738.c: New test.
The test used vect_perm_short for the vectorized scanning but
vect_perm3_short for whether that's done with SLP. We're now
generally expecting SLP to be used - even as fallback, so the
following adjusts both to match up, fixing the powerpc64 reported
testsuite issue.
PR testsuite/116784
* gcc.dg/vect/slp-perm-9.c: Use vect_perm_short also for
the SLP check.
I added some whitespace unintentionally in r15-3723-g284c03ec79ec20,
fix that.
gcc/testsuite/ChangeLog:
* gcc.dg/debug/btf/btf-datasec-1.c: Fix whitespace.
There was only a few uses PHI_RESULT_PTR so lets remove it and use gimple_phi_result_ptr
or gimple_phi_result directly instead.
Since I was modifying ssa-iterators.h for the use of PHI_RESULT_PTR, change the use
of PHI_RESULT there to be gimple_phi_result instead.
This also removes one extra indirection that was done for PHI_RESULT so stage2 building
should be slightly faster.
Bootstrapped and tested on x86_64-linux-gnu.
PR middle-end/116643
gcc/ChangeLog:
* ssa-iterators.h (single_phi_def): Use gimple_phi_result
instead of PHI_RESULT.
(op_iter_init_phidef): Use gimple_phi_result/gimple_phi_result_ptr
instead of PHI_RESULT/PHI_RESULT_PTR.
* tree-ssa-operands.h (PHI_RESULT_PTR): Remove.
(PHI_RESULT): Use gimple_phi_result directly.
(SET_PHI_RESULT): Use gimple_phi_result_ptr directly.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
This PR points out the we're not implementing [dcl.fct.def.default]
properly. Consider e.g.
struct C {
C(const C&&) = default;
};
where we wrongly emit an error, but the move ctor should be just =deleted.
According to [dcl.fct.def.default], if the type of the special member
function differs from the type of the corresponding special member function
that would have been implicitly declared in a way other than as allowed
by 2.1-4, the function is defined as deleted. There's an exception for
assignment operators in which case the program is ill-formed.
clang++ has a warning for when we delete an explicitly-defaulted function
so this patch adds it too.
When the code is ill-formed, we emit an error in all modes. Otherwise,
we emit a pedwarn in C++17 and a warning in C++20.
PR c++/116162
gcc/c-family/ChangeLog:
* c.opt (Wdefaulted-function-deleted): New.
gcc/cp/ChangeLog:
* class.cc (check_bases_and_members): Don't set DECL_DELETED_FN here,
leave it to defaulted_late_check.
* cp-tree.h (maybe_delete_defaulted_fn): Declare.
(defaulted_late_check): Add a tristate parameter.
* method.cc (maybe_delete_defaulted_fn): New.
(defaulted_late_check): Add a tristate parameter. Call
maybe_delete_defaulted_fn instead of giving an error.
gcc/ChangeLog:
* doc/invoke.texi: Document -Wdefaulted-function-deleted.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/defaulted15.C: Add dg-warning/dg-error.
* g++.dg/cpp0x/defaulted51.C: Likewise.
* g++.dg/cpp0x/defaulted52.C: Likewise.
* g++.dg/cpp0x/defaulted53.C: Likewise.
* g++.dg/cpp0x/defaulted54.C: Likewise.
* g++.dg/cpp0x/defaulted56.C: Likewise.
* g++.dg/cpp0x/defaulted57.C: Likewise.
* g++.dg/cpp0x/defaulted58.C: Likewise.
* g++.dg/cpp0x/defaulted59.C: Likewise.
* g++.dg/cpp0x/defaulted63.C: New test.
* g++.dg/cpp0x/defaulted64.C: New test.
* g++.dg/cpp0x/defaulted65.C: New test.
* g++.dg/cpp0x/defaulted66.C: New test.
* g++.dg/cpp0x/defaulted67.C: New test.
* g++.dg/cpp0x/defaulted68.C: New test.
* g++.dg/cpp0x/defaulted69.C: New test.
* g++.dg/cpp23/defaulted1.C: New test.
Similarly to the previous patch, dwarf2asm.cc had
HAVE_DESIGNATED_INITIALIZERS support, and as fallback a huge switch.
The switch from what I can see is expanded as a jump table with 256
label pointers and code at those labels then loads addresses of
string literals.
The following patch instead uses a table with 256 const char * pointers,
NULL for ICE, non-NULL for returning something, similarly to the
HAVE_DESIGNATED_INITIALIZERS case.
2024-09-19 Jakub Jelinek <jakub@redhat.com>
* dwarf2asm.cc (eh_data_format_name): Use constexpr initialization
of format_names table for C++14 instead of a large switch.
This patch would like to add testcases of the signed scalar SAT_ADD
for form 2. Aka:
Form 2:
#define DEF_SAT_S_ADD_FMT_2(T, UT, MIN, MAX) \
T __attribute__((noinline)) \
sat_s_add_##T##_fmt_2 (T x, T y) \
{ \
T sum = (UT)x + (UT)y; \
if ((x ^ y) < 0 || (sum ^ x) >= 0) \
return sum; \
return x < 0 ? MIN : MAX; \
}
DEF_SAT_S_ADD_FMT_2 (int64_t, uint64_t, INT64_MIN, INT64_MAX)
The below test are passed for this patch.
* The rv64gcv fully regression test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_s_add-5.c: New test.
* gcc.target/riscv/sat_s_add-6.c: New test.
* gcc.target/riscv/sat_s_add-7.c: New test.
* gcc.target/riscv/sat_s_add-8.c: New test.
* gcc.target/riscv/sat_s_add-run-5.c: New test.
* gcc.target/riscv/sat_s_add-run-6.c: New test.
* gcc.target/riscv/sat_s_add-run-7.c: New test.
* gcc.target/riscv/sat_s_add-run-8.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
The following reverts a bogus fix done for PR101009 and instead makes
sure we get into the same_access_functions () case when computing
the distance vector for g[1] and g[1] where the constants ended up
having different types. The generic code doesn't seem to handle
loop invariant dependences. The special case gets us both
( 0 ) and ( 1 ) as distance vectors while formerly we got ( 1 ),
which the PR101009 fix changed to ( 0 ) with bad effects on other
cases as shown in this PR.
PR tree-optimization/116768
* tree-data-ref.cc (build_classic_dist_vector_1): Revert
PR101009 change.
* tree-chrec.cc (eq_evolutions_p): Make sure (sizetype)1
and (int)1 compare equal.
* gcc.dg/torture/pr116768.c: New testcase.
The following changes the fallback to disable SLP when any of the
discovered SLP instances failed to pass vectorization checking into
a fallback that emulates what no SLP would do with SLP - force
single-lane discovery for all instances.
The patch does not remove the final fallback to disable SLP but it
reduces the fallout from failing vectorization when any non-SLP
stmt survives analysis.
* tree-vectorizer.h (vect_analyze_slp): Add force_single_lane
parameter.
* tree-vect-slp.cc (vect_analyze_slp_instance): Remove
defaulting of force_single_lane.
(vect_build_slp_instance): Likewise. Pass down appropriate
force_single_lane.
(vect_analyze_slp): Add force_sigle_lane parameter and pass
it down appropriately.
(vect_slp_analyze_bb_1): Always do multi-lane SLP.
* tree-vect-loop.cc (vect_analyze_loop_2): Track two SLP
modes and adjust accordingly.
(vect_analyze_loop_1): Save the SLP mode when unrolling.
* gcc.dg/vect/vect-outer-slp-1.c: Adjust.
The use of #pragma GCC system_header in libstdc++ has led to bugs going
undetected for a while due to the silencing of compiler warnings that would
have revealed them promptly, and also interferes with warnings about
problematic template instantiations induced by user code.
But removing it, or even compiling with -Wsystem-header, is also problematic
due to warnings about deliberate uses of extensions.
So this patch adds #pragma GCC diagnostic as needed to suppress these
warnings.
The change to acinclude.m4 changes -Wabi to warn only in comparison to ABI
19, to avoid lots of warnings that we now mangle concept requirements, which
are in any case still experimental. I checked for any other changes against
ABI v15, and found only the <format> lambda mangling, which we can ignore.
This also enables -Wsystem-headers while building the library, so we see any
warnings not silenced by these #pragmas.
libstdc++-v3/ChangeLog:
* include/bits/algorithmfwd.h:
* include/bits/allocator.h:
* include/bits/codecvt.h:
* include/bits/concept_check.h:
* include/bits/cpp_type_traits.h:
* include/bits/hashtable.h:
* include/bits/iterator_concepts.h:
* include/bits/ostream_insert.h:
* include/bits/ranges_base.h:
* include/bits/regex_automaton.h:
* include/bits/std_abs.h:
* include/bits/stl_algo.h:
* include/c_compatibility/fenv.h:
* include/c_compatibility/inttypes.h:
* include/c_compatibility/stdint.h:
* include/ext/concurrence.h:
* include/ext/type_traits.h:
* testsuite/ext/type_traits/add_unsigned_floating_neg.cc:
* testsuite/ext/type_traits/add_unsigned_integer_neg.cc:
* testsuite/ext/type_traits/remove_unsigned_floating_neg.cc:
* testsuite/ext/type_traits/remove_unsigned_integer_neg.cc:
* include/bits/basic_ios.tcc:
* include/bits/basic_string.tcc:
* include/bits/fstream.tcc:
* include/bits/istream.tcc:
* include/bits/locale_classes.tcc:
* include/bits/locale_facets.tcc:
* include/bits/ostream.tcc:
* include/bits/regex_compiler.tcc:
* include/bits/sstream.tcc:
* include/bits/streambuf.tcc:
* configure: Regenerate.
* include/bits/c++config:
* include/c/cassert:
* include/c/cctype:
* include/c/cerrno:
* include/c/cfloat:
* include/c/climits:
* include/c/clocale:
* include/c/cmath:
* include/c/csetjmp:
* include/c/csignal:
* include/c/cstdarg:
* include/c/cstddef:
* include/c/cstdio:
* include/c/cstdlib:
* include/c/cstring:
* include/c/ctime:
* include/c/cwchar:
* include/c/cwctype:
* include/c_global/climits:
* include/c_global/cmath:
* include/c_global/cstddef:
* include/c_global/cstdlib:
* include/decimal/decimal:
* include/ext/rope:
* include/std/any:
* include/std/charconv:
* include/std/complex:
* include/std/coroutine:
* include/std/format:
* include/std/iomanip:
* include/std/limits:
* include/std/numbers:
* include/tr1/functional:
* include/tr1/tuple:
* include/tr1/type_traits:
* libsupc++/compare:
* libsupc++/new: Add #pragma GCC diagnostic to suppress
undesired warnings.
* acinclude.m4: Change -Wabi version from 2 to 19.
gcc/ChangeLog:
* ginclude/stdint-wrap.h: Add #pragma GCC diagnostic to suppress
undesired warnings.
* gsyslimits.h: Likewise.
There's special-casing for equal access functions which bypasses
printing the distance vectors. The following makes sure we print
them always which helps debugging.
* tree-data-ref.cc (build_classic_dist_vector): Move
distance vector dumping to single caller ...
(subscript_dependence_tester): ... here, dumping always
when we succeed computing it.
The following restores the use of .SELECT_VL for testcases where it
is safe to use even when using SLP. I've for now restricted it
to single-lane SLP plus optimistically allow store-lane nodes
and assume single-lane roots are not widened but at most to
load-lane who should be fine.
PR tree-optimization/116573
* tree-vect-loop.cc (vect_analyze_loop_2): Allow .SELECV_VL
for SLP but disable it when there's multi-lane instances.
* tree-vect-stmts.cc (vectorizable_store): Only compute the
ptr increment when generating code.
(vectorizable_load): Likewise.
Build a derived type component's type only, when it is not already being
built and the component uses pointer semantics.
gcc/fortran/ChangeLog:
PR fortran/106606
* trans-types.cc (gfc_get_derived_type): Only build non-pointer
derived types as component's types when they are not yet built.
gcc/testsuite/ChangeLog:
* gfortran.dg/recursive_alloc_comp_5.f90: New test.
This patch would like fix the dump check times of vector SAT_ADD. The
middle-end change makes the match times from 2 to 4 times.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-21.c: Adjust
the dump check times from 2 to 4.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-22.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-23.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-24.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-25.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-26.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-27.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-28.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-29.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-30.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-31.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-32.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-8.c: Ditto.
Signed-off-by: Pan Li <pan2.li@intel.com>
This patch would like to leverage the match_cond_with_binary_phi to
match the phi on cond, and get the true/false arg if matched. This
helps a lot to simplify the implementation of gen_phi_on_cond.
Before this patch:
basic_block _b1 = gimple_bb (_a1);
if (gimple_phi_num_args (_a1) == 2)
{
basic_block _pb_0_1 = EDGE_PRED (_b1, 0)->src;
basic_block _pb_1_1 = EDGE_PRED (_b1, 1)->src;
basic_block _db_1 = safe_dyn_cast <gcond *> (*gsi_last_bb (_pb_0_1)) ? _pb_0_1 : _pb_1_1;
basic_block _other_db_1 = safe_dyn_cast <gcond *> (*gsi_last_bb (_pb_0_1)) ? _pb_1_1 : _pb_0_1;
gcond *_ct_1 = safe_dyn_cast <gcond *> (*gsi_last_bb (_db_1));
if (_ct_1 && EDGE_COUNT (_other_db_1->preds) == 1
&& EDGE_COUNT (_other_db_1->succs) == 1
&& EDGE_PRED (_other_db_1, 0)->src == _db_1)
{
tree _cond_lhs_1 = gimple_cond_lhs (_ct_1);
tree _cond_rhs_1 = gimple_cond_rhs (_ct_1);
tree _p0 = build2 (gimple_cond_code (_ct_1), boolean_type_node, _cond_lhs_1, _cond_rhs_1);
bool _arg_0_is_true_1 = gimple_phi_arg_edge (_a1, 0)->flags & EDGE_TRUE_VALUE;
tree _p1 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 0 : 1);
tree _p2 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 1 : 0);
...
After this patch:
basic_block _b1 = gimple_bb (_a1);
tree _p1, _p2;
gcond *_cond_1 = match_cond_with_binary_phi (_a1, &_p1, &_p2);
if (_cond_1)
{
tree _cond_lhs_1 = gimple_cond_lhs (_cond_1);
tree _cond_rhs_1 = gimple_cond_rhs (_cond_1);
tree _p0 = build2 (gimple_cond_code (_cond_1), boolean_type_node, _cond_lhs_1, _cond_rhs_1);
...
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.
gcc/ChangeLog:
* genmatch.cc (dt_operand::gen_phi_on_cond): Leverage the
match_cond_with_binary_phi API to get cond gimple, true and
false TREE arg.
Signed-off-by: Pan Li <pan2.li@intel.com>
Fix code for deep copy of allocatable components in derived type nested
structures generated, but not inserted when the copy had to be done in
a coarray. Additionally fix a comment.
gcc/fortran/ChangeLog:
PR fortran/85002
* trans-array.cc (duplicate_allocatable_coarray): Allow adding
of deep copy code in the when-allocated case. Add bounds
computation before condition, because coarrays need the bounds
also when not allocated.
(structure_alloc_comps): Duplication in the coarray case is done
already, omit it. Add the deep-code when duplication a coarray.
* trans-expr.cc (gfc_trans_structure_assign): Fix comment.
gcc/testsuite/ChangeLog:
* gfortran.dg/coarray/alloc_comp_9.f90: New test.
As recently implemented for svdiv, this patch folds svmul to a zero
vector if one of the operands is a zero vector. This transformation is
applied if at least one of the following conditions is met:
- the first operand is all zeros or
- the second operand is all zeros, and the predicate is ptrue or the
predication is _x or _z.
In contrast to constant folding, which was implemented in a previous
patch, this transformation is applied as soon as one of the operands is
a zero vector, while the other operand can be a variable.
The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?
Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/
* config/aarch64/aarch64-sve-builtins-base.cc (svmul_impl::fold):
Add folding of all-zero operands to zero vector.
gcc/testsuite/
* gcc.target/aarch64/sve/const_fold_mul_1.c: Adjust expected
outcome.
* gcc.target/aarch64/sve/fold_mul_zero.c: New test.
This is a small patch that sets the L1 cache line size for Neoverse V2.
Unlike the other cache-related constants in there this value is not used just
for SW prefetch generation (which we want to avoid for Neoverse V2 presently).
It's also used to set std::hardware_destructive_interference_size.
See the links and recent discussions in PR116662 for reference.
Some CPU tunings in aarch64 set this value to something useful, but for
generic tuning we use the conservative 256, which forces 256-byte alignment
in such atomic structures. Using a smaller value can decrease the size of such
structs during layout and should not present an ABI problem as
std::hardware_destructive_interference_size is not intended to be used for structs
in an external interface, and GCC warns about such uses.
Another place where the L1 cache line size is used is in phiopt for
-fhoist-adjacent-loads where conditional accesses to adjacent struct members
can be speculatively loaded as long as they are within the same L1 cache line.
e.g.
struct S { int i; int j; };
int
bar (struct S *x, int y)
{
int r;
if (y)
r = x->i;
else
r = x->j;
return r;
}
The Neoverse V2 L1 cache line is 64 bytes according to the TRM, so set it to
that. The rest of the prefetch parameters inherit from the generic tuning so
we don't do anything extra for software prefeteches.
Bootstrapped and tested on aarch64-none-linux-gnu.
Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
* config/aarch64/tuning_models/neoversev2.h (neoversev2_prefetch_tune):
Define.
(neoversev2_tunings): Use it.
The memory attr of some instructions should be 'load', but these are
'none', currently.
gcc/ChangeLog:
* config/i386/i386.md: Add ssemov2, sseicvt2.
* config/i386/sse.md (sse2_cvtsi2sd): Apply sseicvt2.
(sse2_cvtsi2sdq<round_name>): Ditto.
(vec_set<mode>_0): Apply ssemov2 for 4, 6.
When matching the cond with 2 args phi node, we need to figure out
which arg of phi node comes from the true edge of cond block, as
well as the false edge. This patch would like to add interface
to perform the action and return the true and false arg in TREE type.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.
gcc/ChangeLog:
* gimple-match-head.cc (match_cond_with_binary_phi): Add new func
impl to match binary phi for true and false arg.
Signed-off-by: Pan Li <pan2.li@intel.com>
Since r15-3539, there are requests coming in to add other alias option
documentation. This patch will add all ot them, including corei7, corei7-avx,
core-avx-i, core-avx2, atom, slm, gracemont and emerarldrapids.
Also in the patch, I reordered that part of documentation, currently all
the CPUs/products are just all over the place. I regrouped them by
date-to-now products (since the very first CPU to latest Panther Lake), P-core
(since the clients become hybrid cores, starting from Sapphire Rapids) and
E-core (since Bonnell to latest Clearwater Forest).
And in the patch, I refined the product names in documentation.
gcc/ChangeLog:
* doc/invoke.texi: Add corei7, corei7-avx, core-avx-i,
core-avx2, atom, slm, gracemont and emerarldrapids. Reorder
the -march documentation by splitting them into date-to-now
products, P-core and E-core. Refine the product names in
documentation.
Since commit r15-3594, we fixed the bugs in MASK_TYPE for AVX10.2
testcases, but we missed the following four.
The tests are not FAIL since the binutils part haven't been merged
yet, which leads to UNSUPPORTED test. But the avx512f-mask-type.h
needs to be included, otherwise, it will be compile error.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx10_2-512-vpdpbssd-2.c: Include
avx512f-mask-type.h.
* gcc.target/i386/avx10_2-vminmaxsd-2.c: Ditto.
* gcc.target/i386/avx10_2-vminmaxsh-2.c: Ditto.
* gcc.target/i386/avx10_2-vminmaxss-2.c: Ditto.
This test awkwardly "blinks"; xfails and xpasses apparently
randomly for cris-elf using the "gdb simulator". On
inspection, I see that the stack address depends on the
number of environment variables, deliberately passed to the
simulator, each adding the size of a pointer.
This test is IMHO important enough not to be just skipped
just because it blinks (fixing the actual problem is a
different task).
I guess a random non-16 stack-alignment could happen for
other targets as well, so let's try and add a generic
machinery to "stabilize" the test as failing, by allocating
a dynamic amount to make sure it's misaligned. The most
target-dependent item here is an offset between the incoming
stack-pointer value (within main in the added framework) and
outgoing (within "xmain" as called from main when setting up
the p0 parameter). I know there are other wonderful stack
shapes, but such targets would fall under the "complicated
situations"-label and are no worse off than before.
* gcc.dg/pr84877.c: Try to make the test result consistent by
misaligning the stack.