mirror/gcc - gcc - Alpaca Bi's Private Git Pepository

mirror/gcc

mirror of https://github.com/gcc-mirror/gcc.git synced 2024-11-21 13:40:47 +00:00

Author	SHA1	Message	Date
John David Anglin	b468821eea	Skip several analyzer socket tests on hppa--hpux* 2024-01-14 John David Anglin <danglin@gcc.gnu.org> gcc/testsuite/ChangeLog: PR analyzer/113150 * c-c++-common/analyzer/fd-glibc-byte-stream-socket.c: Skip on hppa--hpux. c-c++-common/analyzer/fd-manpage-getaddrinfo-client.c: Likewise. * c-c++-common/analyzer/fd-mappage-getaddrinfo-server.c: Likewise. * c-c++-common/analyzer/fd-symbolic-socket.c: Likewise. * gcc.dg/analyzer/fd-glibc-byte-stream-connection-server.c: Likewise.	2024-01-14 18:23:51 +00:00
Georg-Johann Lay	48448055fb	AVR: Support .rodata in Flash for AVR64* and AVR128* Devices. These devices see a 32 KiB block of their program memory (flash) in the RAM address space. This can be used to support .rodata in flash provided Binutils support PR31124 (Add new emulations which locate .rodata in flash). This patch does the following: * configure checks availability of Binutils PR31124. * Add new command line options -mrodata-in-ram and -mflmap. While -flmap is for internal usage (communicate hardware properties from device-specs to the compiler proper), -mrodata-in-ram is a user space option that allows to return to the current rodata-in-ram layout. * Adjust gen-avr-mmcu-specs.cc so that device-specs are generated that sanity check options, and that translate -m[no-]rodata-in-ram to its emulation. * Objects in .rodata don't drag __do_copy_data. * Document new options and built-in macros. PR target/112944 gcc/ * configure.ac [target=avr]: Check availability of emulations avrxmega2_flmap and avrxmega4_flmap, resulting in new config vars HAVE_LD_AVR_AVRXMEGA2_FLMAP and HAVE_LD_AVR_AVRXMEGA4_FLMAP. * configure: Regenerate. * config.in: Regenerate. * doc/invoke.texi (AVR Options): Document -mflmap, -mrodata-in-ram, __AVR_HAVE_FLMAP__, __AVR_RODATA_IN_RAM__. * config/avr/avr.opt (-mflmap, -mrodata-in-ram): New options. * config/avr/avr-arch.h (enum avr_device_specific_features): Add AVR_ISA_FLMAP. * config/avr/avr-mcus.def (AVR_MCU) [avr64, avr128]: Set isa flag AVR_ISA_FLMAP. * config/avr/avr.cc (avr_arch_index, avr_has_rodata_p): New vars. (avr_set_core_architecture): Set avr_arch_index. (have_avrxmega2_flmap, have_avrxmega4_flmap) (have_avrxmega3_rodata_in_flash): Set new static const bool according to configure results. (avr_rodata_in_flash_p): New function using them. (avr_asm_init_sections): Let readonly_data_section->unnamed.callback track avr_need_copy_data_p only if not avr_rodata_in_flash_p(). (avr_asm_named_section): Track avr_has_rodata_p. (avr_file_end): Emit __do_copy_data also when avr_has_rodata_p and not avr_rodata_in_flash_p (). * config/avr/specs.h (CC1_SPEC): Add %(cc1_rodata_in_ram). (LINK_SPEC): Add %(link_rodata_in_ram). (LINK_ARCH_SPEC): Remove. * config/avr/gen-avr-mmcu-specs.cc (have_avrxmega3_rodata_in_flash) (have_avrxmega2_flmap, have_avrxmega4_flmap): Set new static const bool according to configure results. (diagnose_mrodata_in_ram): New function. (print_mcu): Generate specs with the following changes: <cc1_misc, asm_misc, link_misc>: New specs so that we don't need to extend avr/specs.h each time we add a new bell or whistle. <cc1_rodata_in_ram, link_rodata_in_ram>: New specs to diagnose -m[no-]rodata-in-ram. <cpp_rodata_in_ram>: New. Does -D__AVR_RODATA_IN_RAM__=0/1. <cpp_mcu>: Add -D__AVR_AVR_FLMAP__ if it applies. <cpp>: Add %(cpp_rodata_in_ram). <link_arch>: Use emulation avrxmega2_flmap, avrxmega4_flmap as requested. <self_spec>: Add -mflmap or %<mflmap as needed. gcc/testsuite/ * gcc.target/avr/torture/pr112944-flmap-0.c: New test. * gcc.target/avr/torture/pr112944-flmap-1.c: New test.	2024-01-14 19:11:24 +01:00
Jeff Law	e927cfa842	[committed] Fix MIPS bootstrap mips bootstraps have been broken for a while. They've been triggering an error about mutually exclusive equal-tests always being false when building gencondmd. This was ultimately tracked down to the ior<mode>3_mips16_asmacro pattern. The pattern uses the GPR mode iterator which looks like this: (define_mode_iterator GPR [SI (DI "TARGET_64BIT")]) The condition for the pattern looks like this: "ISA_HAS_MIPS16E2" And if you dig into ISA_HAS_MIPS16E2: /* The MIPS16e V2 instructions are available. / && !TARGET_64BIT) The way the mode iterator is handled is by adding its condition to the pattern's condition when we expand copies of the pattern resulting in this condition for one of the two generated patterns: (TARGET_MIPS16 && TARGET_MIPS16E2 && !TARGET_64BIT) && TARGET_64BIT This can never be true because of the TARGET_64BIT tests. The fix is trivial. Don't use a mode iterator on that pattern. Bootstrapped on mips64el. I don't have any tests to compare against, so no regression test data. gcc/ config/mips/mips.md (ior<mode>3_mips16_asmacro): Use SImode, not the GPR iterator. Adjust pattern name and mode attribute accordingly.	2024-01-14 07:53:49 -07:00
GCC Administrator	ed5bf2080c	Daily bump.	2024-01-14 00:17:47 +00:00
Harald Anlauf	20da561652	Fortran: intrinsic ISHFTC and missing optional argument SIZE [PR67277] gcc/fortran/ChangeLog: PR fortran/67277 * trans-intrinsic.cc (gfc_conv_intrinsic_ishftc): Handle optional dummy argument for SIZE passed to ISHFTC. Set default value to BIT_SIZE(I) when missing. gcc/testsuite/ChangeLog: PR fortran/67277 * gfortran.dg/ishftc_optional_size_1.f90: New test.	2024-01-13 22:00:21 +01:00
John David Anglin	3f235afacf	hppa64: Fix fmt_f_default_field_width_3.f90 and fmt_g_default_field_width_3.f90 The hppa64--hpux target is not included in the set of fortran_real_16 targets because it doesn't have cosl. However, these tests don't need cosl, etc. 2024-01-13 John David Anglin <danglin@gcc.gnu.org> gcc/testsuite/ChangeLog: * gfortran.dg/fmt_f_default_field_width_3.f90: Add hppa64--hpux to real_16 dg-error targets. * gfortran.dg/fmt_g_default_field_width_3.f90: Likewise.	2024-01-13 18:06:21 +00:00
Harald Anlauf	9935667a69	Fortran: annotations for DO CONCURRENT loops [PR113305] gcc/fortran/ChangeLog: PR fortran/113305 * gfortran.h (gfc_loop_annot): New. (gfc_iterator, gfc_forall_iterator): Use for annotation control. * array.cc (gfc_copy_iterator): Adjust. * gfortran.texi: Document annotations IVDEP, UNROLL n, VECTOR, NOVECTOR as applied to DO CONCURRENT. * parse.cc (parse_do_block): Parse annotations IVDEP, UNROLL n, VECTOR, NOVECTOR as applied to DO CONCURRENT. Apply UNROLL only to first loop control variable. * trans-stmt.cc (iter_info): Use gfc_loop_annot. (gfc_trans_simple_do): Adjust. (gfc_trans_forall_loop): Annotate loops with IVDEP, UNROLL n, VECTOR, NOVECTOR as needed for DO CONCURRENT. (gfc_trans_forall_1): Handle loop annotations. gcc/testsuite/ChangeLog: PR fortran/113305 * gfortran.dg/do_concurrent_7.f90: New test.	2024-01-13 14:47:24 +01:00
Jonathan Wakely	f8a5298c97	libstdc++: Implement P2255R2 dangling checks for std::tuple [PR108822] This is the last part of PR libstdc++/108822 implementing P2255R2, which makes it ill-formed to create a std::tuple that would bind a reference to a temporary. The dangling checks are implemented as deleted constructors for C++20 and higher, and as Debug Mode static assertions in the constructor body for older standards. This is similar to the r13-6084-g916ce577ad109b changes for std::pair. As part of this change, I've reimplemented most of std::tuple for C++20, making use of concepts to replace the enable_if constraints, and using conditional explicit to avoid duplicating most constructors. We could use conditional explicit for the C++11 implementation too (with pragmas to disables the -Wc++17-extensions warnings), but that should be done as a stage 1 change for GCC 15 rather than now. The partial specialization for std::tuple<T1, T2> is no longer used for C++20 (or more precisely, for a C++20 compiler that supports concepts and conditional explicit). The additional constructors and assignment operators that take std::pair arguments have been added to the C++20 implementation of the primary template, with sizeof...(_Elements)==2 constraints. This avoids reimplementing all the other constructors in the std::tuple<T1, T2> partial specialization to use concepts. This way we avoid four implementations of every constructor and only have three! (The primary template has an implementation of each constructor for C++11 and another for C++20, and the tuple<T1,T2> specialization has an implementation of each for C++11, so that's three for each constructor.) In order to make the constraints more efficient on the C++20 version of the default constructor I've also added a variable template for the __is_implicitly_default_constructible trait, implemented using concepts. libstdc++-v3/ChangeLog: PR libstdc++/108822 * include/std/tuple (tuple): Add checks for dangling references. Reimplement constraints and constant expressions using C++20 features. * include/std/type_traits [C++20] (__is_implicitly_default_constructible_v): Define. (__is_implicitly_default_constructible): Use variable template. * testsuite/20_util/tuple/dangling_ref.cc: New test. Reviewed-by: Patrick Palka <ppalka@redhat.com>	2024-01-13 11:14:09 +00:00
Jakub Jelinek	f77a87982d	lower-bitint: Fix up handle_operand_addr INTEGER_CST handling [PR113361] As the testcase shows, the INTEGER_CST handling in handle_operand_addr (i.e. what is used when passing address of an integer to a bitint library routine) wasn't correct. If the minimum precision to represent an INTEGER_CST is smaller or equal to limb_prec, the code correctly uses m_limb_type; if the minimum precision of a _BitInt INTEGER_CST is large enough such that the bitint is middle, large or huge, everything is fine too. But the code wasn't handling correctly e.g. __int128 constants which need more than limb_prec bits or _BitInt constants which on the architecture are considered small (say have DImode limb_mode, TImode abi_limb_mode and for [65, 128] bits use TImode scalar like the proposed aarch64 patch). Best would be to use an array of 2/3/4 limbs in that case, but we'd need to convert the INTEGER_CST to a CONSTRUCTOR in the right endianity etc., so the code was using mid_min_prec to enforce a middle _BitInt precision. Except that mid_min_prec can be 0 and not computed yet, or it doesn't have to be the smallest middle _BitInt precision, just the smallest so far encountered. So, on the testcase one possibility was that it used precision 65 from mid_min_prec, even when the INTEGER_CST actually needed larger minimum precision (96 bits at least), or crashed when mid_min_prec was 0. The patch fixes it in 2 hunks, the first makes sure we actually try to create a BITINT_TYPE for the > limb_prec cases like __int128, and the second instead of using mid_min_prec attempts to increase mp precision until it isn't small anymore. 2024-01-13 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/113361 * gimple-lower-bitint.cc (bitint_large_huge::handle_operand_addr): Fix up determination of the type for > limb_prec constants. * gcc.dg/torture/bitint-47.c: New test.	2024-01-13 11:29:15 +01:00
Jakub Jelinek	7012a25252	testsuite: Fix up vect-early-break_100-pr113287.c testcase [PR113287] When the testcase was being adjusted for unsigned long -> unsigned long long, two spots using long weren't changed to long long, so the testcase still warns about UB in shifts. 2024-01-13 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/113287 * gcc.dg/vect/vect-early-break_100-pr113287.c: Use long long instead of long.	2024-01-13 10:46:51 +01:00
Jakub Jelinek	65388b2865	c++, demangle: Implement https://github.com/itanium-cxx-abi/cxx-abi/issues/148 non-proposal The following patch attempts to implement what apparently clang++ implemented for explicit object member function mangling, but nobody actually proposed in patch form in https://github.com/itanium-cxx-abi/cxx-abi/issues/148 2024-01-13 Jakub Jelinek <jakub@redhat.com> gcc/cp/ * mangle.cc (write_nested_name): Mangle explicit object member functions with H as per https://github.com/itanium-cxx-abi/cxx-abi/issues/148 non-proposal. gcc/testsuite/ * g++.dg/abi/mangle79.C: New test. include/ * demangle.h (enum demangle_component_type): Add DEMANGLE_COMPONENT_XOBJ_MEMBER_FUNCTION. libiberty/ * cp-demangle.c (FNQUAL_COMPONENT_CASE): Add case for DEMANGLE_COMPONENT_XOBJ_MEMBER_FUNCTION. (d_dump): Handle DEMANGLE_COMPONENT_XOBJ_MEMBER_FUNCTION. (d_nested_name): Parse H after N in nested name. (d_count_templates_scopes): Handle DEMANGLE_COMPONENT_XOBJ_MEMBER_FUNCTION. (d_print_mod): Likewise. (d_print_function_type): Likewise. * testsuite/demangle-expected: Add tests for explicit object member functions.	2024-01-13 10:23:53 +01:00
Andrew Pinski	34a827039f	Add a few testcases for fix missed optimization regressions Adds a few new testcases for some missed optimization regressions. The analysis on how each should be optimized is in the testcases themselves (and in the bug report). Committed as obvious after running the testsuite to make sure they pass. PR tree-optimization/107823 PR tree-optimization/110768 PR tree-optimization/110941 PR tree-optimization/110450 PR tree-optimization/110841 gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/ssa-thread-22.c: New test. * gcc.dg/tree-ssa/vrp-loop-1.c: New test. * gcc.dg/tree-ssa/vrp-loop-2.c: New test. * gcc.dg/tree-ssa/vrp-unreachable-1.c: New test. * gcc.dg/tree-ssa/vrp-unreachable-2.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>	2024-01-12 20:38:06 -08:00
Patrick Palka	ac1a399bf6	libstdc++: Implement C++23 std::bind_back from P2387R3 [PR108827] The implementation is based off of std::bind_front. Since this is a C++23 feature we use deducing this unconditionally. PR libstdc++/108827 PR libstdc++/111327 libstdc++-v3/ChangeLog: * include/bits/version.def (bind_back): Define. * include/bits/version.h: Regenerate. * include/std/functional (_Bind_back): Define for C++23. (bind_back): Likewise. * testsuite/20_util/function_objects/bind_back/1.cc: New test (adapted from corresponding bind_front test). * testsuite/20_util/function_objects/bind_back/111327.cc: Likewise.	2024-01-12 23:02:12 -05:00
Patrick Palka	3e1ffa7dd1	libstdc++: Use C++23 deducing this in std::bind_front This simplifies the operator() of _Bind_front using C++23 deducing this, allowing us to condense multiple operator() overloads into one. In passing I think we can remove _Bind_front's defaulted special member declarations and just let the compiler implicitly generate them for us. libstdc++-v3/ChangeLog: * include/std/functional (_Bind_front): Remove =default special member function declarations. (_Bind_front::operator()): Implement using C++23 deducing this when available. * testsuite/20_util/function_objects/bind_front/111327.cc: Adjust testcase to expect better errors in C++23 mode.	2024-01-12 22:55:43 -05:00
Patrick Palka	c48bedd180	libstdc++/ranges: Use perfect forwarding in _Pipe and _Partial ctors This avoids redundant moves when composing and partially applying range adaptor objects. libstdc++-v3/ChangeLog: * include/std/ranges (views::__adaptor::operator\|): Perform perfect forwarding of arguments. (views::__adaptor::_RangeAdaptor::operator()): Pass dummy first argument to _Partial. (views::__adaptor::_Partial::_Partial): Likewise. Add dummy first parameter. (views::__adaptor::_Pipe::_Pipe): Perform perfect forwarding of arguments. (to): Pass dummy first argument to _Partial.	2024-01-12 22:54:59 -05:00
GCC Administrator	444a31f3b3	Daily bump.	2024-01-13 00:18:48 +00:00
Jonathan Wakely	c224dec0e7	libstdc++: Fix non-portable results from 64-bit std::subtract_with_carry_engine [PR107466] I implemented the resolution of LWG 3809 in r13-4364-ga64775a0edd469 but it was recently noted in the MSVC STL github repo that the change causes possible truncation for 64-bit seeds. Whether the truncation occurs (and to what value) depends on the width of uint_least32_t which is not portable, so the output of the PRNG for 64-bit seed values is no longer the same as in C++20, and no longer portable across platforms. That new issue was filed as LWG 4014. I proposed a new change which reduces the seed by the LCG's modulus before the conversion to uint_least32_t. This ensures that 64-bit seed values are consistently reduced by the modulus before any truncation. This removes the platform-dependent behaviour and restores the old behaviour for std::subtract_with_carry_engine specializations using a 64-bit result type (such as std::ranlux48_base). libstdc++-v3/ChangeLog: PR libstdc++/107466 * include/bits/random.tcc (subtract_with_carry_engine::seed): Implement proposed resolution of LWG 4014. * testsuite/26_numerics/random/pr60037-neg.cc: Adjust dg-error line number. * testsuite/26_numerics/random/subtract_with_carry_engine/cons/lwg3809.cc: Check for expected result of 64-bit engine with seed that doesn't fit in 32-bits.	2024-01-13 00:11:44 +00:00
Georg-Johann Lay	8b447fa89d	AVR: Documentation: Web-Link an example ld-Script for Address-Space __flashN. gcc/ * doc/extend.texi (AVR Named Address Spaces, Limitations and Caveats): Add web-link to the avr-gcc wiki.	2024-01-12 19:01:07 +01:00
Georg-Johann Lay	45a22144bf	AVR: Documentation: Attribute address has exactly one argument. gcc/ * doc/extend.texi (AVR Variable Attributes) [address]: Remove documentation for a version without argument, which is not supported.	2024-01-12 18:46:02 +01:00
Jason Merrill	27521a2f4f	c++: __class_type_info and modules [PR113038] Doing a dynamic_cast in both TUs broke because we were declaring a new __class_type_info in _b that conflicted with the one imported in the global module from _a. It seems clear to me that any new class declaration in the global module should merge with an imported definition, but for GCC 14 let's just fix this for the specific case of __class_type_info. PR c++/113038 gcc/cp/ChangeLog: * name-lookup.cc (lookup_elaborated_type): Look for bindings in the global namespace in the ABI namespace. gcc/testsuite/ChangeLog: * g++.dg/modules/pr106304_b.C: Add dynamic_cast.	2024-01-12 12:14:36 -05:00
Ezra Sitorus	9dadc9ccdd	arm: vld1_types_x4 ACLE intrinsics This patch is part of a series of patches implementing the _xN variants of the vld1 intrinsic for the arm port. This patch adds the _x4 variants of the vld1 intrinsic. The previous vld1_x4 has been updated to vld1q_x4 to take into account that it works with 4-word-length types. vld1_x4 is now only for 2-word-length types. ACLE documents: https://developer.arm.com/documentation/ihi0053/latest/ ISA documents: https://developer.arm.com/documentation/ddi0487/latest/ gcc/ChangeLog: * config/arm/arm_neon.h (vld1_u8_x4, vld1_u16_x4, vld1_u32_x4, vld1_u64_x4): New. (vld1_s8_x4, vld1_s16_x4, vld1_s32_x4, vld1_s64_x4): New. (vld1_f16_x4, vld1_f32_x4): New. (vld1_p8_x4, vld1_p16_x4, vld1_p64_x4): New. (vld1_bf16_x4): New. (vld1q_types_x4): Updated to use vld1q_x4 from arm_neon_builtins.def * config/arm/arm_neon_builtins.def (vld1_x4): Updated entries. (vld1q_x4): New entries, but comes from the old vld1_x4 * config/arm/neon.md (neon_vld1q_x4<mode>): Updated from neon_vld1_x4<mode>. gcc/testsuite/ChangeLog: * gcc.target/arm/simd/vld1_base_xN_1.c: Updated. * gcc.target/arm/simd/vld1_bf16_xN_1.c: Updated. * gcc.target/arm/simd/vld1_fp16_xN_1.c: Updated. * gcc.target/arm/simd/vld1_p64_xN_1.c: Updated.	2024-01-12 17:00:34 +00:00
Ezra Sitorus	4b887e59ac	arm: vld1_types_x3 ACLE intrinsics This patch is part of a series of patches implementing the _xN variants of the vld1 intrinsic for the arm port. This patch adds the _x3 variants of the vld1 intrinsic. The previous vld1_x3 has been updated to vld1q_x3 to take into account that it works with 4-word-length types. vld1_x3 is now only for 2-word-length types. ACLE documents: https://developer.arm.com/documentation/ihi0053/latest/ ISA documents: https://developer.arm.com/documentation/ddi0487/latest/ gcc/ChangeLog: * config/arm/arm_neon.h (vld1_u8_x3, vld1_u16_x3, vld1_u32_x3, vld1_u64_x3): New. (vld1_s8_x3, vld1_s16_x3, vld1_s32_x3, vld1_s64_x3): New. (vld1_f16_x3, vld1_f32_x3): New. (vld1_p8_x3, vld1_p16_x3, vld1_p64_x3): New. (vld1_bf16_x3): New. (vld1q_types_x3): Updated to use vld1q_x3 from arm_neon_builtins.def * config/arm/arm_neon_builtins.def (vld1_x3): Updated entries. (vld1q_x3): New entries, but comes from the old vld1_x2 * config/arm/neon.md (neon_vld1q_x3<mode>): Updated from neon_vld1_x3<mode>. gcc/testsuite/ChangeLog: * gcc.target/arm/simd/vld1_base_xN_1.c: Updated. * gcc.target/arm/simd/vld1_bf16_xN_1.c: Updated. * gcc.target/arm/simd/vld1_fp16_xN_1.c: Updated. * gcc.target/arm/simd/vld1_p64_xN_1.c: Updated.	2024-01-12 17:00:34 +00:00
Ezra Sitorus	d2b4ec9ea3	arm: vld1_types_x2 ACLE intrinsics This patch is part of a series of patches implementing the _xN variants of the vld1 intrinsic for the arm port. This patch adds the _x2 variants of the vld1 intrinsic. The previous vld1_x2 has been updated to vld1q_x2 to take into account that it works with 4-word-length types. vld1_x2 is now only for 2-word-length types. ACLE documents: https://developer.arm.com/documentation/ihi0053/latest/ ISA documents: https://developer.arm.com/documentation/ddi0487/latest/ gcc/ChangeLog: * config/arm/arm_neon.h (vld1_u8_x2, vld1_u16_x2, vld1_u32_x2, vld1_u64_x2): New. (vld1_s8_x2, vld1_s16_x2, vld1_s32_x2, vld1_s64_x2): New. (vld1_f16_x2, vld1_f32_x2): New. (vld1_p8_x2, vld1_p16_x2, vld1_p64_x2): New. (vld1_bf16_x2): New. (vld1q_types_x2): Updated to use vld1q_x2 from arm_neon_builtins.def * config/arm/arm_neon_builtins.def (vld1_x2): Updated entries. (vld1q_x2): New entries, but comes from the old vld1_x2 * config/arm/neon.md (neon_vld1<VMEMX2_q>_x2<VDQX:mode>): Updated from neon_vld1_x2<mode>. gcc/testsuite/ChangeLog: * gcc.target/arm/simd/vld1_base_xN_1.c: Add new tests. * gcc.target/arm/simd/vld1_bf16_xN_1.c: Add new tests. * gcc.target/arm/simd/vld1_fp16_xN_1.c: Add new tests. * gcc.target/arm/simd/vld1_p64_xN_1.c: Add new tests.	2024-01-12 17:00:34 +00:00
Ezra Sitorus	a52fdae91d	arm: vst1q_types_x4 ACLE intrinsics This patch is part of a series of patches implementing the _xN variants of the vst1q intrinsic for the arm port. This patch adds the _x4 variants of the vst1q intrinsic. ACLE: https://developer.arm.com/documentation/ihi0053/latest/ ISA documents: https://developer.arm.com/documentation/ddi0487/latest/ gcc/ChangeLog: * config/arm/arm_neon.h (vst1q_u8_x4, vst1q_u16_x4, vst1q_u32_x4, vst1q_u64_x4): New. (vst1q_s8_x4, vst1q_s16_x4, vst1q_s32_x4, vst1q_s64_x4): New. (vst1q_f16_x4, vst1q_f32_x4): New. (vst1q_p8_x4, vst1q_p16_x4, vst1q_p64_x4): New. (vst1q_bf16_x4): New. * config/arm/arm_neon_builtins.def (vst1q_x4): New entries. * config/arm/neon.md (neon_vst1q_x4<mode>): New. (neon_vst1x4qa<mode>, neon_vst1x4qb<mode>): New. * config/arm/unspecs.md (UNSPEC_VST1X4A, UNSPEC_VST1X4B): New. gcc/testsuite/ChangeLog: * gcc.target/arm/simd/vst1q_base_xN_1.c: Updated * gcc.target/arm/simd/vst1q_bf16_xN_1.c: Updated * gcc.target/arm/simd/vst1q_fp16_xN_1.c: Updated * gcc.target/arm/simd/vst1q_p64_xN_1.c: Updated	2024-01-12 17:00:34 +00:00
Ezra Sitorus	2a0e91d6c1	arm: vst1q_types_x3 ACLE intrinsics This patch is part of a series of patches implementing the _xN variants of the vst1q intrinsic for the arm port. This patch adds the _x3 variants of the vst1q intrinsic. ACLE documents: https://developer.arm.com/documentation/ihi0053/latest/ ISA documents: https://developer.arm.com/documentation/ddi0487/latest/ gcc/ChangeLog: * config/arm/arm_neon.h (vst1q_u8_x3, vst1q_u16_x3, vst1q_u32_x3, vst1q_u64_x3): New. (vst1q_s8_x3, vst1q_s16_x3, vst1q_s32_x3, vst1q_s64_x3): New. (vst1q_f16_x3, vst1q_f32_x3): New. (vst1q_p8_x3, vst1q_p16_x3, vst1q_p64_x3): New. (vst1q_bf16_x3): New. * config/arm/arm_neon_builtins.def (vst1q_x3): New entries. * config/arm/neon.md (neon_vst1q_x3<mode>): New. (neon_vld1x3qa<mode>, neon_vst1x3qb<mode>): New. * config/arm/unspecs.md (UNSPEC_VST1X3A, UNSPEC_VST1X3B): New. gcc/testsuite/ChangeLog: * gcc.target/arm/simd/vst1q_base_xN_1.c: Add new tests. * gcc.target/arm/simd/vst1q_bf16_xN_1.c: Add new tests. * gcc.target/arm/simd/vst1q_fp16_xN_1.c: Add new tests. * gcc.target/arm/simd/vst1q_p64_xN_1.c: Add new tests.	2024-01-12 17:00:34 +00:00
Ezra Sitorus	ccf041daa7	arm: vst1q_types_x2 ACLE intrinsics This patch is part of a series of patches implementing the _xN variants of the vst1q intrinsic for the arm port. This patch adds the _x2 variants of the vst1q intrinsic. ACLE documents: https://developer.arm.com/documentation/ihi0053/latest/ ISA documents: https://developer.arm.com/documentation/ddi0487/latest/ gcc/ChangeLog: * config/arm/arm_neon.h (vst1q_u8_x2, vst1q_u16_x2, vst1q_u32_x2, vst1q_u64_x2): New. (vst1q_s8_x2, vst1q_s16_x2, vst1q_s32_x2, vst1q_s64_x2): New. (vst1q_f16_x2, vst1q_f32_x2): New. (vst1q_p8_x2, vst1q_p16_x2, vst1q_p64_x2): New. (vst1q_bf16_x2): New. * config/arm/arm_neon_builtins.def (vst1<_x2): New entries. * config/arm/neon.md (neon_vst1<VMEMX2_q>_x2<VDQX:mode>): Updated from neon_vst1_x2<mode>. * config/arm/iterators.md (VMEMX2): New mode iterator. (VMEMX2_q): New mode attribute. gcc/testsuite/ChangeLog: * gcc.target/arm/simd/vst1q_base_xN_1.c: Add new tests. * gcc.target/arm/simd/vst1q_bf16_xN_1.c: Add new tests. * gcc.target/arm/simd/vst1q_fp16_xN_1.c: Add new tests. * gcc.target/arm/simd/vst1q_p64_xN_1.c: Add new tests.	2024-01-12 17:00:34 +00:00
Ezra Sitorus	221912d34e	arm: vst1_types_x4 ACLE intrinsics This patch is part of a series of patches implementing the _xN variants of the vst1 intrinsic for the arm port. This patch adds the _x4 variants of the vst1 intrinsic. ACLE documents: https://developer.arm.com/documentation/ihi0053/latest/ ISA documents: https://developer.arm.com/documentation/ddi0487/latest/ gcc/ChangeLog: * config/arm/arm_neon.h (vst1_u8_x4, vst1_u16_x4, vst1_u32_x4, vst1_u64_x4): New. (vst1_s8_x4, vst1_s16_x4, vst1_s32_x4, vst1_s64_x4): New. (vst1_f16_x4, vst1_f32_x4): New. (vst1_p8_x4, vst1_p16_x4, vst1_p64_x4): New. (vst1_bf16_x4): New. * config/arm/arm_neon_builtins.def (vst1_x4): New entries. * config/arm/neon.md (vst1_x4<mode>): New. gcc/testsuite/ChangeLog: * gcc.target/arm/simd/vst1_base_xN_1.c: Updated. * gcc.target/arm/simd/vst1_bf16_xN_1.c: Updated. * gcc.target/arm/simd/vst1_fp16_xN_1.c: Updated. * gcc.target/arm/simd/vst1_p64_xN_1.c: Updated.	2024-01-12 17:00:33 +00:00
Ezra Sitorus	f06c6ad317	arm: vst1_types_x3 ACLE intrinsics This patch is part of a series of patches implementing the _xN variants of the vst1 intrinsic for the arm port. This patch adds the _x3 variants of the vst1 intrinsic. ACLE documents: https://developer.arm.com/documentation/ihi0053/latest/ ISA documents: https://developer.arm.com/documentation/ddi0487/latest/ gcc/ChangeLog: * config/arm/arm_neon.h (vst1_u8_x3, vst1_u16_x3, vst1_u32_x3, vst1_u64_x3): New. (vst1_s8_x3, vst1_s16_x3, vst1_s32_x3, vst1_s64_x3): New. (vst1_f16_x3, vst1_f32_x3): New. (vst1_p8_x3, vst1_p16_x3, vst1_p64_x3): New. (vst1_bf16_x3): New. * config/arm/arm_neon_builtins.def (vst1_x3): New entries. * config/arm/neon.md (vst1_x3<mode>): New. gcc/testsuite/ChangeLog: * gcc.target/arm/simd/vst1_base_xN_1.c: Updated. * gcc.target/arm/simd/vst1_bf16_xN_1.c: Updated. * gcc.target/arm/simd/vst1_fp16_xN_1.c: Updated. * gcc.target/arm/simd/vst1_p64_xN_1.c: Updated.	2024-01-12 17:00:33 +00:00
Ezra Sitorus	2bd944af54	arm: vst1_types_x2 ACLE intrinsics This patch is part of a series of patches implementing the _xN variants of the vst1 intrinsic for the arm port. This patch adds the _x2 variants of the vst1 intrinsic. ACLE documents: https://developer.arm.com/documentation/ihi0053/latest/ ISA documents: https://developer.arm.com/documentation/ddi0487/latest/ gcc/ChangeLog: * config/arm/arm_neon.h (vst1_u8_x2, vst1_u16_x2, vst1_u32_x2, vst1_u64_x2): New. (vst1_s8_x2, vst1_s16_x2, vst1_s32_x2, vst1_s64_x2): New. (vst1_f16_x2, vst1_f32_x2): New. (vst1_p8_x2, vst1_p16_x2, vst1_p64_x2): New. (vst1_bf16_x2): New. * config/arm/arm_neon_builtins.def (vst1_x2): New entries. * config/arm/neon.md (vst1_x2<mode>): New. gcc/testsuite/ChangeLog: * gcc.target/arm/simd/vst1_base_xN_1.c: Add new tests. * gcc.target/arm/simd/vst1_bf16_xN_1.c: Add new tests. * gcc.target/arm/simd/vst1_fp16_xN_1.c: Add new tests. * gcc.target/arm/simd/vst1_p64_xN_1.c: Add new tests.	2024-01-12 17:00:33 +00:00
Ezra Sitorus	84d713f6f4	arm: vld1q_types_x4 ACLE intrinsics This patch is part of a series of patches implementing the _xN variants of the vld1q intrinsic for the arm port. This patch adds the _x4 variants of the vld1q intrinsic. ACLE documents: https://developer.arm.com/documentation/ihi0053/latest/ ISA documents: https://developer.arm.com/documentation/ddi0487/latest/ gcc/ChangeLog: * config/arm/arm_neon.h (vld1q_u8_x4, vld1q_u16_x4, vld1q_u32_x4, vld1q_u64_x4): New. (vld1q_s8_x4, vld1q_s16_x4, vld1q_s32_x4, vld1q_s64_x4): New. (vld1q_f16_x4, vld1q_f32_x4): New. (vld1q_p8_x4, vld1q_p16_x4, vld1q_p64_x4): New. (vld1q_bf16_x4): New. * config/arm/arm_neon_builtins.def (vld1_x4): New entries. * config/arm/neon.md (neon_vld1_x4<mode>): New. (neon_vld1x4qa<mode>, neon_vld1x4qb<mode>): New * config/arm/unspecs.md (UNSPEC_VLD1X4A, UNSPEC_VLD1X4B): New. gcc/testsuite/ChangeLog: * gcc.target/arm/simd/vld1q_base_xN_1.c: Updated. * gcc.target/arm/simd/vld1q_bf16_xN_1.c: Updated. * gcc.target/arm/simd/vld1q_fp16_xN_1.c: Updated. * gcc.target/arm/simd/vld1q_p64_xN_1.c: Updated.	2024-01-12 17:00:33 +00:00
Ezra Sitorus	c8ec3e1327	arm: vld1q_types_x3 ACLE intrinsics This patch is part of a series of patches implementing the _xN variants of the vld1q intrinsic for the arm port. This patch adds the _x3 variants of the vld1q intrinsic. ACLE documents: https://developer.arm.com/documentation/ihi0053/latest/ ISA documents: https://developer.arm.com/documentation/ddi0487/latest/ gcc/ChangeLog: * config/arm/arm_neon.h (vld1q_u8_x3, vld1q_u16_x3, vld1q_u32_x3, vld1q_u64_x3): New. (vld1q_s8_x3, vld1q_s16_x3, vld1q_s32_x3, vld1q_s64_x3): New. (vld1q_f16_x3, vld1q_f32_x3): New. (vld1q_p8_x3, vld1q_p16_x3, vld1q_p64_x3): New. (vld1q_bf16_x3): New. * config/arm/arm_neon_builtins.def (vld1_x3): New entries. * config/arm/neon.md (neon_vld1_x3<mode>): New. (neon_vld1x3qa<mode>, neon_vld1x3qb<mode>): New. * config/arm/unspecs.md (UNSPEC_VLD1X3A, UNSPEC_VLD1X3B): New. gcc/testsuite/ChangeLog: * gcc.target/arm/simd/vld1q_base_xN_1.c: Add new tests. * gcc.target/arm/simd/vld1q_bf16_xN_1.c: Add new tests. * gcc.target/arm/simd/vld1q_fp16_xN_1.c: Add new tests. * gcc.target/arm/simd/vld1q_p64_xN_1.c: Add new tests.	2024-01-12 17:00:33 +00:00
Ezra Sitorus	ace4b8e7f9	arm: vld1q_types_x2 ACLE intrinsics This patch is part of a series of patches implementing the _xN variants of the vld1q intrinsic for the arm port. This patch adds the _x2 variants of the vld1q intrinsic. ACLE documents: https://developer.arm.com/documentation/ihi0053/latest/ ISA documents: https://developer.arm.com/documentation/ddi0487/latest/ gcc/ChangeLog: * config/arm/arm_neon.h (vld1q_u8_x2, vld1q_u16_x2, vld1q_u32_x2, vld1q_u64_x2): New. (vld1q_s8_x2, vld1q_s16_x2, vld1q_s32_x2, vld1q_s64_x2): New. (vld1q_f16_x2, vld1q_f32_x2): New. (vld1q_p8_x2, vld1q_p16_x2, vld1q_p64_x2): New. (vld1q_bf16_x2): New. * config/arm/arm_neon_builtins.def (vld1_x2): New entries. * config/arm/neon.md (vld1_x2<mode>): New. gcc/testsuite/ChangeLog: * gcc.target/arm/simd/vld1q_base_xN_1.c: Add new test. * gcc.target/arm/simd/vld1q_bf16_xN_1.c: Add new test. * gcc.target/arm/simd/vld1q_fp16_xN_1.c: Add new test. * gcc.target/arm/simd/vld1q_p64_xN_1.c: Add new test.	2024-01-12 17:00:33 +00:00
Jakub Jelinek	95440171d0	c: Avoid _BitInt indexes > sizetype in ARRAY_REFs [PR113315] When build_array_ref doesn't use ARRAY_REF, it casts the index to sizetype already, performs POINTER_PLUS_EXPR and then dereferences. While when emitting ARRAY_REF, we try to keep index expression as is in whatever type it had, which is reasonable e.g. for signed or unsigned types narrower than sizetype for loop optimizations etc. But if the index is wider than sizetype, we are unnecessarily computing bits beyond what is needed. For {,unsigned }__int128 on 64-bit arches or {,unsigned }long long on 32-bit arches we've been doing that for decades, so the following patch doesn't propose to change that (might be stage1 material), but for _BitInt at least the _BitInt lowering code doesn't expect to see large/huge _BitInt in the ARRAY_REF indexes, I was expecting one would see just casts of those to sizetype. So, the following patch makes sure that large/huge _BitInt indexes don't appear in ARRAY_REFs. 2024-01-12 Jakub Jelinek <jakub@redhat.com> PR c/113315 * c-typeck.cc (build_array_ref): If index has BITINT_TYPE type with precision larger than sizetype precision, convert it to sizetype. * gcc.dg/bitint-65.c: New test. * gcc.dg/bitint-66.c: New test.	2024-01-12 17:11:49 +01:00
Tamar Christina	d14ef0987d	testsuite: Make bitint early vect test more accurate This changes the tests I committed for PR113287 to also run on targets that don't support bitint. gcc/ChangeLog: PR tree-optimization/113287 * doc/sourcebuild.texi (check_effective_target_bitint65535): New. gcc/testsuite/ChangeLog: PR tree-optimization/113287 * gcc.dg/vect/vect-early-break_100-pr113287.c: Support non-bitint. * gcc.dg/vect/vect-early-break_99-pr113287.c: Likewise. * lib/target-supports.exp (bitint, bitint128, bitint575, bitint65535): Document them.	2024-01-12 15:32:19 +00:00
Tamar Christina	a8dbae4592	middle-end: remove more usages of single_exit This replaces two more usages of single_exit that I had missed before. They both seem to happen when we re-use the ifcvt scalar loop for versioning. The condition in versioning is the same as the one for when we don't re-use the scalar loop. gcc/ChangeLog: * tree-vect-loop-manip.cc (vect_loop_versioning): Replace single_exit. * tree-vect-loop.cc (vect_transform_loop): Likewise.	2024-01-12 15:32:19 +00:00
Tamar Christina	e79c5855ab	middle-end: fill in reduction PHI for all alt exits [PR113178] When we have a loop with more than 2 exits and a reduction I forgot to fill in the PHI value for all alternate exits. All alternate exits use the same PHI value so we should loop over the new PHI elements and copy the value across since we call the reduction calculation code only once for all exits. This was normally covered up by earlier parts of the compiler rejecting loops incorrectly (which has been fixed now). Note that while I can use the loop in all cases, the reason I separated out the main and alt exit is so that if you pass the wrong edge the macro will assert. gcc/ChangeLog: PR tree-optimization/113178 * tree-vect-loop.cc (vect_create_epilog_for_reduction): Fill in all alternate exits. gcc/testsuite/ChangeLog: PR tree-optimization/113178 * gcc.dg/vect/vect-early-break_101-pr113178.c: New test. * gcc.dg/vect/vect-early-break_102-pr113178.c: New test.	2024-01-12 15:31:48 +00:00
Tamar Christina	99c0a540d6	middle-end: thread through existing LCSSA variable for alternative exits too [PR113237] Builing on top of the previous patch, similar to when we have a single exit if we have a case where all exits are considered early exits and there are existing non virtual phi then in order to maintain LCSSA we have to use the existing PHI variables. We can't simply clear them and just rebuild them because the order of the PHIs in the main exit must match the original exit for when we add the skip_epilog guard. But the infrastructure is already in place to maintain them, we just have to use the right value. gcc/ChangeLog: PR tree-optimization/113237 * tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg): Use existing LCSSA variable for exit when all exits are early break. gcc/testsuite/ChangeLog: PR tree-optimization/113237 * gcc.dg/vect/vect-early-break_98-pr113237.c: New test.	2024-01-12 15:31:11 +00:00
Tamar Christina	411de96dbf	middle-end: maintain LCSSA form when peeled vector iterations have virtual operands This patch fixes several interconnected issues. 1. When picking an exit we wanted to check for niter_desc.may_be_zero not true. i.e. we want to pick an exit which we know will iterate at least once. However niter_desc.may_be_zero is not a boolean. It is a tree that encodes a boolean value. !niter_desc.may_be_zero is just checking if we have some information, not what the information is. This leads us to pick a more difficult to vectorize exit more often than we should. 2. Because we had this bug, we used to pick an alternative exit much more ofthen which showed one issue, when the loop accesses memory and we "invert it" we would corrupt the VUSE chain. This is because on an peeled vector iteration every exit restarts the loop (i.e. they're all early) BUT since we may have performed a store, the vUSE would need to be updated. This version maintains virtual PHIs correctly in these cases. Note that we can't simply remove all of them and recreate them because we need the PHI nodes still in the right order for if skip_vector. 3. Since we're moving the stores to a safe location I don't think we actually need to analyze whether the store is in range of the memref, because if we ever get there, we know that the loads must be in range, and if the loads are in range and we get to the store we know the early breaks were not taken and so the scalar loop would have done the VF stores too. 4. Instead of searching for where to move stores to, they should always be in exit belonging to the latch. We can only ever delay stores and even if we pick a different exit than the latch one as the main one, effects still happen in program order when vectorized. If we don't move the stores to the latch exit but instead to whever we pick as the "main" exit then we can perform incorrect memory accesses (luckily these are trapped by verify_ssa). 5. We only used to analyze loads inside the same BB as an early break, and also we'd never analyze the ones inside the block where we'd be moving memory references to. This is obviously bogus and to fix it this patch splits apart the two constraints. We first validate that all load memory references are in bounds and only after that do we perform the alias checks for the writes. This makes the code simpler to understand and more trivially correct. gcc/ChangeLog: PR tree-optimization/113137 PR tree-optimization/113136 PR tree-optimization/113172 PR tree-optimization/113178 * tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg): Maintain PHIs on inverted loops. (vect_do_peeling): Maintain virtual PHIs on inverted loops. * tree-vect-loop.cc (vec_init_loop_exit_info): Pick exit closes to latch. (vect_create_loop_vinfo): Record all conds instead of only alt ones. gcc/testsuite/ChangeLog: PR tree-optimization/113137 PR tree-optimization/113136 PR tree-optimization/113172 PR tree-optimization/113178 * g++.dg/vect/vect-early-break_4-pr113137.cc: New test. * g++.dg/vect/vect-early-break_5-pr113137.cc: New test. * gcc.dg/vect/vect-early-break_95-pr113137.c: New test. * gcc.dg/vect/vect-early-break_96-pr113136.c: New test. * gcc.dg/vect/vect-early-break_97-pr113172.c: New test.	2024-01-12 15:31:06 +00:00
Tamar Christina	6cb155a6cf	middle-end: make memory analysis for early break more deterministic [PR113135] Instead of searching for where to move stores to, they should always be in exit belonging to the latch. We can only ever delay stores and even if we pick a different exit than the latch one as the main one, effects still happen in program order when vectorized. If we don't move the stores to the latch exit but instead to whever we pick as the "main" exit then we can perform incorrect memory accesses (luckily these are trapped by verify_ssa). We used to iterate over the conds and check the loads and stores inside them. However this relies on the conds being ordered in program order. Additionally if there is a basic block between two conds we would not have analyzed it. Instead this now walks from the preds of the destination basic block up to the loop header and analyzes every block along the way. As a later optimization we could stop as soon as we've seen all the BBs we have conds for. For now the header will always contain the first cond, but this can change when we support arbitrary control flow. gcc/ChangeLog: PR tree-optimization/113135 * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Rework dependency analysis. gcc/testsuite/ChangeLog: PR tree-optimization/113135 * gcc.dg/vect/vect-early-break_103-pr113135.c: New test.	2024-01-12 15:24:49 +00:00
Jason Merrill	a0e3d2ff62	c++: cand_parms_match and reversed candidates When considering whether the candidate parameters match, according to the language we're considering the synthesized reversed candidate, so we should compare the parameters in swapped order. In this situation it doesn't make sense to consider whether object parameters correspond, since we're comparing an object parameter to a non-object parameter, so I generalized xobj_iobj_parameters_correspond accordingly. As I refine cand_parms_match, more behaviors need to differ between its original use to compare the original templates for two candidates, and the later use to decide whether to compare constraints. So now there's a parameter to select between the semantics. gcc/cp/ChangeLog: * call.cc (reversed_match): New. (enum class pmatch): New enum. (cand_parms_match): Add match_kind parm. (object_parms_correspond): Add fn parms. (joust): Adjust. * class.cc (xobj_iobj_parameters_correspond): Rename to... (iobj_parm_corresponds_to): ...this. Take the other type instead of a second function. (object_parms_correspond): Adjust. * cp-tree.h (iobj_parm_corresponds_to): Declare. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/concepts-memfun4.C: Change expected reversed handling.	2024-01-12 09:11:24 -05:00
Iain Sandoe	846794ead2	Objective-C, Darwin: Fix a regression in handling bad receivers. This is seen on 32b hosts with a 64b multilib, and is an ICE when the build has checking enabled. The fix is to exit the routine early if the sender or receiver are already error_mark_node. gcc/objc/ChangeLog: * objc-next-runtime-abi-02.cc (build_v2_objc_method_fixup_call): Early exit for cases where the sender or receiver are known to be in error. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>	2024-01-12 14:09:29 +00:00
Iain Sandoe	4e6f7a3d5c	Darwin, powerpc: Fix bootstrap. Recent changes to the member names of the diagnostics class missed one case in the Darwin PowerPC host code. Fixed thus. gcc/ChangeLog: * config/rs6000/host-darwin.cc (segv_handler): Use the revised diagnostics class member name for abort of error. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>	2024-01-12 14:00:21 +00:00
Georg-Johann Lay	549ea1487a	AVR: Work around "sequence of 3 consecutive punctuation characters" warning. gcc/ * config/avr/avr.cc (avr_handle_addr_attribute): Move "..." from format string to %s argument.	2024-01-12 14:44:03 +01:00
Richard Biener	2b811d9fef	Bump BASE-VER to 14.0.1 now that we are in stage4. * BASE-VER: Bump to 14.0.1.	2024-01-12 14:16:06 +01:00
Jakub Jelinek	c05beab4ae	varasm: Fix up process_pending_assemble_externals [PR113182] John reported that on HP-UX we no longer emit needed external libcalls. The problem is that we didn't strip name encoding when looking up the identifiers in assemble_external_libcall and process_pending_assemble_externals, while assemble_name_resolve does that: const char real_name = targetm.strip_name_encoding (name); tree id = maybe_get_identifier (real_name); if (id) { ... mark_referenced (id); The intention is that assemble_external_libcall ensures the IDENTIFIER exists for the external libcall, then for actually emitted calls assemble_name_resolve sees those IDENTIFIERS and sets TREE_SYMBOL_REFERENCED on them and finally process_pending_assemble_externals looks the IDENTIFIER up again and checks its TREE_SYMBOL_REFERENCED. But without the strip_name_encoding call, they can look up different identifiers and those are likely never used. In the PR, John was discussing whether get_identifier or maybe_get_identifier should be used, I believe in assemble_external_libcall we definitely want to use get_identifier, we need an IDENTIFIER allocated so that it can be actually tracked, in process_pending_assemble_externals it doesn't matter, the IDENTIFIER should be already created. 2024-01-12 John David Anglin <danglin@gcc.gnu.org> Jakub Jelinek <jakub@redhat.com> PR middle-end/113182 varasm.cc (process_pending_assemble_externals, assemble_external_libcall): Use targetm.strip_name_encoding before calling get_identifier.	2024-01-12 13:58:07 +01:00
Richard Sandiford	74e3e839ab	aarch64: Rework uxtl->zip optimisation [PR113196] g:f26f92b534f9 implemented unsigned extensions using ZIPs rather than UXTL{,2}, since the former has a higher throughput than the latter on amny cores. The optimisation worked by lowering directly to ZIP during expand, so that the zero input could be hoisted and shared. However, changing to ZIP means that zero extensions no longer benefit from some existing combine patterns. The patch included new patterns for UADDW and USUBW, but the PR shows that other patterns were affected as well. This patch instead introduces the ZIPs during a pre-reload split and forcibly hoists the zero move to the outermost scope. This has the disadvantage of executing the move even for a shrink-wrapped function, which I suppose could be a problem if it causes a kernel to trap and enable Advanced SIMD unnecessarily. In other circumstances, an unused move shouldn't affect things much. Also, the RA should be able to rematerialise the move at an appropriate point if necessary, such as if there is an intervening call. In https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641948.html I'd then tried to allow a zero to be recombined back into a solitary ZIP. However, that relied on late-combine, which didn't make it into GCC 14. This version instead restricts the split to cases where the UXTL executes more frequently as the entry block (which is where we plan to put the zero). Also, the original optimisation contained a big-endian correction that I don't think is needed/correct. Even on big-endian targets, we want the ZIP to take the low half of an element from the input vector and the high half from the zero vector. And the patterns map directly to the underlying Advanced SIMD instructions: the use of unspecs means that there's no need to adjust for the difference between GCC and Arm lane numbering. gcc/ PR target/113196 * config/aarch64/aarch64.h (machine_function::advsimd_zero_insn): New member variable. * config/aarch64/aarch64-protos.h (aarch64_split_simd_shift_p): Declare. * config/aarch64/iterators.md (Vnarrowq2): New mode attribute. * config/aarch64/aarch64-simd.md (vec_unpacku_hi_<mode>, vec_unpacks_hi_<mode>): Recombine into... (vec_unpack<su>_hi_<mode>): ...this. Move the generation of zip2 for zero-extends to... (aarch64_simd_vec_unpack<su>_hi_<mode>): ...a split of this instruction. Fix big-endian handling. (vec_unpacku_lo_<mode>, vec_unpacks_lo_<mode>): Recombine into... (vec_unpack<su>_lo_<mode>): ...this. Move the generation of zip1 for zero-extends to... (<optab><Vnarrowq><mode>2): ...a split of this instruction. Fix big-endian handling. (aarch64_zip1_uxtl): New pattern. (aarch64_usubw<mode>_lo_zip, aarch64_uaddw<mode>_lo_zip): Delete (aarch64_usubw<mode>_hi_zip, aarch64_uaddw<mode>_hi_zip): Likewise. config/aarch64/aarch64.cc (aarch64_get_shareable_reg): New function. (aarch64_gen_shareable_zero): Use it. (aarch64_split_simd_shift_p): New function. gcc/testsuite/ PR target/113196 * gcc.target/aarch64/pr113196.c: New test. * gcc.target/aarch64/simd/vmovl_high_1.c: Remove double include. Expect uxtl2 rather than zip2. * gcc.target/aarch64/vect_mixed_sizes_8.c: Expect zip1 rather than uxtl. * gcc.target/aarch64/vect_mixed_sizes_9.c: Likewise. * gcc.target/aarch64/vect_mixed_sizes_10.c: Likewise.	2024-01-12 12:38:01 +00:00
Richard Sandiford	0d74ff2d7e	Keep track of the FUNCTION_BEG note function.cc emits a NOTE_FUNCTION_BEG after all arguments have been copied to pseudos. It then records this note in parm_birth_insn. Various other pieces of code use this insn as a convenient place to insert things at the start of the function. However, cfgexpand later changes parm_birth_insn as follows: /* If we emitted any instructions for setting up the variables, emit them before the FUNCTION_START note. / if (var_seq) { emit_insn_before (var_seq, parm_birth_insn); / In expand_function_end we'll insert the alloca save/restore before parm_birth_insn. We've just insertted an alloca call. Adjust the pointer to match. / parm_birth_insn = var_seq; } But the FUNCTION_BEG note is still useful for things that aren't sensitive to stack allocation, and it has the advantage that (unlike the var_seq above) it is never deleted or combined. This patch adds a separate variable to track it. gcc/ emit-rtl.h (rtl_data::x_function_beg_note): New member variable. (function_beg_insn): New macro. * function.cc (expand_function_start): Initialize function_beg_insn.	2024-01-12 12:38:00 +00:00
Richard Sandiford	81d309168b	aarch64: Use a global map to detect duplicated overloads [PR112989] As explained in the covering note to the previous patch, the fact that aarch64-sve-* is now used for multiple header files means that function_builder::add_overloaded_function now needs to use a global map to detect duplicated overload functions, instead of the member variable that it used previously. gcc/ PR target/112989 * config/aarch64/aarch64-sve-builtins.h (function_builder::m_overload_names): Replace with... * config/aarch64/aarch64-sve-builtins.cc (overload_names): ...this new global. (add_overloaded_function): Update accordingly, using get_identifier to get a GGC-friendly record of the name.	2024-01-12 12:29:22 +00:00
Richard Sandiford	d76651d917	aarch64: Use a separate group for SME builtins [PR112989] The PR shows that we were registering the same overloaded SVE builtins twice. This was supposed to be prevented by function_builder::add_overloaded_function, which uses a map to detect whether a function of the same name has already been registered. add_overloaded_function then had some asserts to check for consistency. However, the map that add_overloaded_function uses was a member of function_builder itself. That made sense when there was just one header file, arm_sve.h, since it meant that the memory could be reclaimed once arm_sve.h had been processed. But now we have three header files, and in principle, it's possible for arm_sme.h to include overloads of things that arm_sve.h also defines. We therefore need to use a global map instead. However, doing that meant that the consistency checks in add_overloaded_function fired as expected, which showed some latent issues. This preliminary patch deals with those by adding AARCH64_FL_SME to things that require AARCH64_FL_SME2. This inconsistency led to another problem: functions were selected for arm_sme.h over arm_sve.h based on whether they had AARCH64_FL_SME. So some SME2-only things were actually defined in arm_sve.h, whereas similar SME things were defined in arm_sme.h. Choosing based on flags was an early get-started crutch that I forgot to clean up later :( This patch goes for the more direct approach of having a separate table of SME builtins, as for arm_neon_sve_bridge.h. aarch64-sve-builtins-sve2.def contains several intrinsics that are currently SME-only but that operate entirely on vector registers. Many of these will be extended to SVE2.1 once SVE2.1 support is added, so the patch front-loads that by keeping the current division between aarch64-sve-builtins-sve2.def (whose functions now go in arm_sve.h) and aarch64-sve-builtins-sme.def (whose functions now go in arm_sme.h). gcc/ PR target/112989 * config/aarch64/aarch64-sve-builtins.def: Don't include aarch64-sve-builtins-sme.def. (DEF_SME_ZA_FUNCTION_GS, DEF_SME_ZA_FUNCTION): Move to... * config/aarch64/aarch64-sve-builtins-sme.def: ...here. (DEF_SME_FUNCTION): New macro. Use it and DEF_SME_FUNCTION_GS instead of DEF_SVE_. Add AARCH64_FL_SME to anything that requires AARCH64_FL_SME2. config/aarch64/aarch64-sve-builtins-sve2.def: Make same AARCH64_FL_SME adjustment here. * config/aarch64/aarch64-sve-builtins.cc (function_groups): Don't include SME intrinsics. (sme_function_groups): New array. (handle_arm_sve_h): Remove check for AARCH64_FL_SME. (handle_arm_sme_h): Use sme_function_groups instead of function_groups. gcc/testsuite/ PR target/112989 * gcc.target/aarch64/sve/acle/general-c/clamp_1.c: Remove bogus error test.	2024-01-12 12:29:22 +00:00
Juzhe-Zhong	0acb63670b	RISC-V: Adjust scalar_to_vec cost 1. Introduce vector regmove new tune info. 2. Adjust scalar_to_vec cost in add_stmt_cost. We will get optimal codegen after this patch with -march=rv64gcv_zvl256b: lui a5,%hi(a) li a4,19 sb a4,%lo(a)(a5) li a0,0 ret Tested on both RV32/RV64 no regression, Ok for trunk ? PR target/113281 gcc/ChangeLog: * config/riscv/riscv-protos.h (struct regmove_vector_cost): New struct. (struct cpu_vector_cost): Add regmove struct. (get_vector_costs): Export as global. * config/riscv/riscv-vector-costs.cc (adjust_stmt_cost): Adjust scalar_to_vec cost. (costs::add_stmt_cost): Ditto. * config/riscv/riscv.cc (get_common_costs): Export global function. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr113209.c: Adapt test. * gcc.dg/vect/costmodel/riscv/rvv/pr113281-1.c: New test. * gcc.dg/vect/costmodel/riscv/rvv/pr113281-2.c: New test.	2024-01-12 20:15:51 +08:00

1 2 3 4 5 ...

207211 Commits