Commit Graph

214312 Commits

Author SHA1 Message Date
GCC Administrator
ccd7ede939 Daily bump. 2024-10-11 00:17:48 +00:00
Richard Ball
a17a9bdcb3 aarch64: Alter pr116258.c test to correct for big endian.
The test at pr116258.c fails on big endian targets,
this is because the test checks that the index of a floating
point multiply is 0, which is correct only for little endian.

gcc/testsuite/ChangeLog:

	PR tree-optimization/116258
	* gcc.target/aarch64/pr116258.c:
	Alter test to add big-endian support.
2024-10-10 19:16:39 +01:00
Michael Matz
85bee4f77b Fix PR116650: check all regs in regrename targets
(this came up for m68k vs. LRA, but is a generic problem)

Regrename wants to use new registers for certain def-use chains.
For validity of replacements it needs to check that the selected
candidates are unused up to then.  That's done in check_new_reg_p.
But if it so happens that the new register needs more hardregs
than the old register (which happens if the target allows inter-bank
moves and the mode is something like a DFmode that needs to be placed
into a SImode reg-pair), then check_new_reg_p only checks the
first of those registers for free-ness.

This is caused by that function looking up the number of necessary
hardregs only in terms of the old hardreg number.  It of course needs
to do that in terms of the new candidate regnumber.  The symptom is that
regrename sometimes clobbers the higher numbered registers of such a
regrename target pair.  This patch fixes that problem.

(In the particular case of the bug report it was LRA that left over a
inter-bank move instruction that triggers regrename, ultimately causing
the mis-compile.  Reload didn't do that, but in general we of course
can't rely on such moves not happening if the target allows them.)

This also shows a general confusion in that function and the target hook
interface here:

  for (i = nregs - 1; i >= 0; --)
    ...
    || ! HARD_REGNO_RENAME_OK (reg + i, new_reg + i))

it uses nregs in a way that requires it to be the same between old and
new register.  The problem is that the target hook only gets register
numbers, when it instead should get a mode and register numbers and
would be called only for the first but not for subsequent registers.
I've looked at a number of definitions of that target hook and I think
that this is currently harmless in the sense that it would merely rule
out some potential reg-renames that would in fact be okay to do.  So I'm
not changing the target hook interface here and hence that problem
remains unfixed.

	PR rtl-optimization/116650
	* regrename.cc (check_new_reg_p): Calculate nregs in terms of
	the new candidate register.
2024-10-10 17:51:37 +02:00
Andrew Pinski
dc3015ff09 phiopt: Remove candorest variable return instead
After r15-3560-gb081e6c860eb9688d24365d39, the setting of candorest
with the break can just change to a return since this is inside a lambda now.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

	* tree-ssa-phiopt.cc (pass_phiopt::execute): Remove candorest
	and return instead of setting candorest.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-10-10 15:32:36 +00:00
Li Xu
fd8e590ff1 RISC-V:Bugfix for C++ code compilation failure with rv32imafc_zve32f[pr116883]
From: xuli <xuli1@eswincomputing.com>

Example as follows:

int main()
{
  unsigned long arraya[128], arrayb[128], arrayc[128];
  for (int i = 0; i < 128; i++)
   {
      arraya[i] = arrayb[i] + arrayc[i];
   }
  return 0;
}

Compiled with -march=rv32imafc_zve32f -mabi=ilp32f, it will cause a compilation issue:

riscv_vector.h:40:25: error: ambiguating new declaration of 'vint64m4_t __riscv_vle64(vbool16_t, const long long int*, unsigned int)'
   40 | #pragma riscv intrinsic "vector"
      |                         ^~~~~~~~
riscv_vector.h:40:25: note: old declaration 'vint64m1_t __riscv_vle64(vbool64_t, const long long int*, unsigned int)'

With zvl=32b, vbool16_t is registered in init_builtins() with
type_common.precision=0x101 (nunits=2), mode_nunits[E_RVVMF16BI]=[2,2].

Normally, vbool64_t is only valid when TARGET_MIN_VLEN > 32, so vbool64_t
is not registered in init_builtins(), meaning vbool64_t=null.

In order to implement __attribute__((target("arch=+v"))), we must register
all vector types and all RVV intrinsics. Therefore, vbool64_t will be registered
by default with zvl=128b in reinit_builtins(), resulting in
type_common.precision=0x101 (nunits=2) and mode_nunits[E_RVVMF64BI]=[2,2].

We then get TYPE_VECTOR_SUBPARTS(vbool16_t) == TYPE_VECTOR_SUBPARTS(vbool64_t),
calculated using type_common.precision, resulting in 2. Since vbool16_t and
vbool64_t have the same element type (boolean_type), the compiler treats them
as the same type, leading to a re-declaration conflict.

After all types and intrinsics have been registered, processing
__attribute__((target("arch=+v"))) will update the parameters option and
init_adjust_machine_modes. Therefore, to avoid conflicts, we can choose
zvl=4096b for the null type reinit_builtins().

command option zvl=32b
  type         nunits
  vbool64_t => null
  vbool32_t=> [1,1]
  vbool16_t=> [2,2]
  vbool8_t=>  [4,4]
  vbool4_t=>  [8,8]
  vbool2_t=>  [16,16]
  vbool1_t=>  [32,32]

reinit zvl=128b
  vbool64_t => [2,2] conflict with zvl32b vbool16_t=> [2,2]
reinit zvl=256b
  vbool64_t => [4,4] conflict with zvl32b vbool8_t=>  [4,4]
reinit zvl=512b
  vbool64_t => [8,8] conflict with zvl32b vbool4_t=>  [8,8]
reinit zvl=1024b
  vbool64_t => [16,16] conflict with zvl32b vbool2_t=>  [16,16]
reinit zvl=2048b
  vbool64_t => [32,32] conflict with zvl32b vbool1_t=>  [32,32]
reinit zvl=4096b
  vbool64_t => [64,64] zvl=4096b is ok

Signed-off-by: xuli <xuli1@eswincomputing.com>

	PR target/116883

gcc/ChangeLog:

	* config/riscv/riscv-c.cc (riscv_pragma_intrinsic_flags_pollute): Choose zvl4096b
	to initialize null type.

gcc/testsuite/ChangeLog:

	* g++.target/riscv/rvv/base/pr116883.C: New test.
2024-10-10 08:51:55 -06:00
Richard Sandiford
9bd19ff515 vect: Avoid divide by zero for permutes of extern VLA vectors
My recent VLA SLP patches caused a regression with cross compilers
in gcc.dg/torture/neon-sve-bridge.c.  There we have a VEC_PERM_EXPR
created from two BIT_FIELD_REFs, with the child node being an
external VLA vector:

note:   node 0x3704a70 (max_nunits=1, refcnt=2) vector(2) long int
note:   op: VEC_PERM_EXPR
note:          stmt 0 val1Return_9 = BIT_FIELD_REF <sveReturn_8, 64, 0>;
note:          stmt 1 val2Return_10 = BIT_FIELD_REF <sveReturn_8, 64, 64>;
note:          lane permutation { 0[0] 0[1] }
note:          children 0x3704b08
note:   node (external) 0x3704b08 (max_nunits=1, refcnt=1) svint64_t
note:          { }

For this kind of external node, the SLP_TREE_LANES is normally
the total number of lanes in the vector, but it is zero if the
vector has variable length:

      auto nunits = TYPE_VECTOR_SUBPARTS (SLP_TREE_VECTYPE (vnode));
      unsigned HOST_WIDE_INT const_nunits;
      if (nunits.is_constant (&const_nunits))
	SLP_TREE_LANES (vnode) = const_nunits;

This led to division by zero in:

      /* Check whether the output has N times as many lanes per vector.  */
      else if (constant_multiple_p (SLP_TREE_LANES (node) * op_nunits,
				    SLP_TREE_LANES (child) * nunits,
				    &this_unpack_factor)
	       && (i == 0 || unpack_factor == this_unpack_factor))
	unpack_factor = this_unpack_factor;

No repetition takes place for this kind of external node, so this
patch goes with Richard's suggestion to check for external nodes
that have no scalar statements.

This didn't show up for my native testing since division by zero
doesn't trap on AArch64.

gcc/
	* tree-vect-slp.cc (vectorizable_slp_permutation_1): Set repeating_p
	to false if we have an external node for a pre-existing vector.
2024-10-10 15:15:26 +01:00
Simon Martin
c1b2100e73 libiberty: Restore build with CP_DEMANGLE_DEBUG defined
cp-demangle.c does not build when CP_DEMANGLE_DEBUG is defined since
r13-2887-gb04208895fed34. This trivial patch fixes the issue.

libiberty/ChangeLog:

	* cp-demangle.c (d_dump): Fix compilation when CP_DEMANGLE_DEBUG
	is defined.
2024-10-10 15:59:38 +02:00
Richard Biener
7ce2229d54 tree-optimization/117060 - fix oversight in vect_build_slp_tree_1
We are failing to match call vs. non-call when dealing with matching
loads or stores.

	PR tree-optimization/117060
	* tree-vect-slp.cc (vect_build_slp_tree_1): When comparing
	calls also fail if the first isn't a call.

	* gfortran.dg/pr117060.f90: New testcase.
2024-10-10 15:26:17 +02:00
Jennifer Schmitz
a2e06b7f08 match.pd: Check trunc_mod vector obtap before folding.
This patch guards the simplification x / y * y == x -> x % y == 0 in
match.pd by a check for:
1) Non-vector mode of x OR
2) Lack of support for vector division OR
3) Support of vector modulo

The patch was bootstrapped and tested with no regression on
aarch64-linux-gnu and x86_64-linux-gnu.
OK for mainline?

Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>

gcc/
	PR tree-optimization/116831
	* match.pd: Guard simplification to trunc_mod with check for
	mod optab support.

gcc/testsuite/
	PR tree-optimization/116831
	* gcc.dg/torture/pr116831.c: New test.
2024-10-10 10:31:01 +02:00
Richard Biener
bcccc3221b Allow SLP store of mixed external and constant
vect_build_slp_tree_1 rejected this during SLP discovery because it
ran into the rhs code comparison code for stores.  The following
skips that completely for loads and stores as those are handled
later anyway.

This needs a heuristic adjustment in vect_get_and_check_slp_defs
to avoid fallout with regard to BB vectorization and splitting
of a store group vs. demoting one operand to external.

gcc.dg/Wstringop-overflow-47.c needs adjustment given we now have
vast improvements for code generation.  gcc.dg/strlenopt-32.c
needs adjustment because the strlen pass doesn't handle

  _11 = {0, b_6(D)};
  __builtin_memcpy (&a, "foo.bar", 8);
  MEM <vector(2) char> [(char *)&a + 3B] = _11;
  _9 = strlen (&a);

I have opened PR117057 for this.

	* tree-vect-slp.cc (vect_build_slp_tree_1): Do not compare
	RHS codes for loads or stores.
	(vect_get_and_check_slp_defs): Only demote operand to external
	in case there is more than one operand.

	* gcc.dg/vect/slp-57.c: New testcase.
	* gcc.dg/Wstringop-overflow-47.c: Adjust.
	* gcc.dg/strlenopt-32.c: XFAIL parts.
2024-10-10 09:00:07 +02:00
liuhongt
9eaecce3d8 Add a new tune avx256_avoid_vec_perm for SRF.
According to Intel SOM[1], For Crestmont,  most 256-bit Intel AVX2
instructions can be decomposed into two independent 128-bit
micro-operations, except for a subset of Intel AVX2 instructions,
known as cross-lane operations, can only compute the result for an
element by utilizing one or more sources belonging to other elements.

The 256-bit instructions listed below use more operand sources than
can be natively supported by a single reservation station within these
microarchitectures. They are decomposed into two μops, where the first
μop resolves a subset of operand dependencies across two cycles. The
dependent second μop executes the 256-bit operation by using a single
128-bit execution port for two consecutive cycles with a five-cycle
latency for a total latency of seven cycles.

VPERM2I128 ymm1, ymm2, ymm3/m256, imm8
VPERM2F128 ymm1, ymm2, ymm3/m256, imm8
VPERMPD ymm1, ymm2/m256, imm8
VPERMPS ymm1, ymm2, ymm3/m256
VPERMD ymm1, ymm2, ymm3/m256
VPERMQ ymm1, ymm2/m256, imm8

Instead of setting tune avx128_optimal for SRF, the patch add a new
tune avx256_avoid_vec_perm for it. so by default, vectorizer still
uses 256-bit VF if cost is profitable, but lowers to 128-bit whenever
256-bit vec_perm is needed for auto-vectorization. w/o vec_perm,
performance of 256-bit vectorization should be similar as 128-bit
ones(some benchmark results show it's even better than 128-bit
vectorization since it enables more parallelism for convert cases.)

[1] https://www.intel.com/content/www/us/en/content-details/814198/intel-64-and-ia-32-architectures-optimization-reference-manual-volume-1.html

gcc/ChangeLog:

	* config/i386/i386.cc (ix86_vector_costs::ix86_vector_costs):
	Add new member m_num_avx256_vec_perm.
	(ix86_vector_costs::add_stmt_cost): Record 256-bit vec_perm.
	(ix86_vector_costs::finish_cost): Prevent vectorization for
	TAREGT_AVX256_AVOID_VEC_PERM when there's 256-bit vec_perm
	instruction.
	* config/i386/i386.h (TARGET_AVX256_AVOID_VEC_PERM): New
	Macro.
	* config/i386/x86-tune.def (X86_TUNE_AVX256_SPLIT_REGS): Add
	m_CORE_ATOM.
	(X86_TUNE_AVX256_AVOID_VEC_PERM): New tune.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx256_avoid_vec_perm.c: New test.
2024-10-10 10:21:29 +08:00
liuhongt
9c8cea8feb Add new microarchitecture tune for SRF/GRR/CWF.
For Crestmont, 4-operand vex blendv instructions come from MSROM and
is slower than 3-instructions sequence (op1 & mask) | (op2 & ~mask).
legacy blendv instruction can still be handled by the decoder.

The patch add a new tune which is enabled for all processors except
for SRF/CWF. It will use vpand + vpandn + vpor instead of
vpblendvb(similar for vblendvps/vblendvpd) for SRF/CWF.

gcc/ChangeLog:

	* config/i386/i386-expand.cc (ix86_expand_sse_movcc): Guard
	instruction blendv generation under new tune.
	* config/i386/i386.h (TARGET_SSE_MOVCC_USE_BLENDV): New Macro.
	* config/i386/x86-tune.def (X86_TUNE_SSE_MOVCC_USE_BLENDV):
	New tune.
2024-10-10 10:21:29 +08:00
Levy Hsu
8718727509 x86: Implement Fast-Math Float Truncation to BF16 via PSRLD Instruction
gcc/ChangeLog:

	* config/i386/i386.md: Rewrite insn truncsfbf2.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/truncsfbf-1.c: New test.
	* gcc.target/i386/truncsfbf-2.c: New test.
2024-10-10 01:54:32 +00:00
David Malcolm
00ede02bc8 diagnostics: move text output member functions to correct file
No functional change intended.

gcc/ChangeLog:
	* diagnostic-format-text.cc
	(diagnostic_text_output_format::after_diagnostic): Replace call to
	show_any_path with body, taken from diagnostic.cc.
	(diagnostic_text_output_format::build_prefix): Move here from
	diagnostic.cc, updating to use get_diagnostic_kind_text and
	diagnostic_get_color_for_kind.
	(diagnostic_text_output_format::file_name_as_prefix): Move here
	from diagnostic.cc
	(diagnostic_text_output_format::append_note): Likewise.
	* diagnostic-format-text.h
	(diagnostic_text_output_format::show_any_path): Drop decl.
	* diagnostic.cc
	(diagnostic_text_output_format::file_name_as_prefix): Move to
	diagnostic-format-text.cc.
	(diagnostic_text_output_format::build_prefix): Likewise.
	(diagnostic_text_output_format::show_any_path): Move to body of
	diagnostic_text_output_format::after_diagnostic.
	(diagnostic_text_output_format::append_note): Move to
	diagnostic-format-text.cc.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-10-09 21:26:09 -04:00
David Malcolm
a4e4f2d225 diagnostics: mark the JSON output format as deprecated
The bulk of the documentation for -fdiagnostics-format= is taken up
by a description of the "json" format added in r9-4156-g478dd60ddcf177.

I don't plan to add any extra features to the "json" format; all my
future work on machine-readable GCC diagnostics is likely to be on the
SARIF output format (https://gcc.gnu.org/wiki/SARIF).

Hence users seeking machine-readable output from GCC should use SARIF.

This patch removes the long documentation of the format and describes it
as deprecated.

gcc/ChangeLog:
	* doc/invoke.texi (fdiagnostics-format): Describe "json" et al as
	deprecated, and remove the long description of the output format.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-10-09 21:26:09 -04:00
David Malcolm
8d0de31c93 lto: reimplement print_lto_docs_link [PR116613]
gcc/ChangeLog:
	PR other/116613
	* lto-wrapper.cc (print_lto_docs_link): Use a format string rather
	than building the string manually.  Fix memory leak of "url" by
	using label_text.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-10-09 21:26:08 -04:00
Sébastien Michelland
e95512e2d5 SH: Use softfp for sh-elf
libgcc/ChangeLog:

	PR target/29845
	* config.host (sh-*-elf*): Replace fdpbit with softfp.
	* config/sh/sfp-machine.h: New file.

Signed-off-by: Sébastien Michelland <sebastien.michelland@lcis.grenoble-inp.fr>
2024-10-10 09:29:33 +09:00
GCC Administrator
e9a213810a Daily bump. 2024-10-10 00:19:03 +00:00
liuhongt
d5d1189c12 Adjust testcase after relax O2 vectorization.
gcc/testsuite/ChangeLog:

	* gcc.dg/fstack-protector-strong.c: Adjust
	scan-assembler-times.
	* gcc.dg/graphite/scop-6.c: Refine the testcase to avoid array
	out of bounds.
	* gcc.dg/graphite/scop-9.c: Ditto.
	* gcc.dg/tree-ssa/ivopts-lt-2.c: Add -fno-tree-vectorize.
	* gcc.dg/tree-ssa/ivopts-lt.c: Ditto.
	* gcc.dg/tree-ssa/loop-16.c: Ditto.
	* gcc.dg/tree-ssa/loop-28.c: Ditto.
	* gcc.dg/tree-ssa/loop-bound-2.c: Ditto.
	* gcc.dg/tree-ssa/loop-bound-4.c: Ditto.
	* gcc.dg/tree-ssa/loop-bound-6.c: Ditto.
	* gcc.dg/tree-ssa/predcom-4.c: Ditto.
	* gcc.dg/tree-ssa/predcom-5.c: Ditto.
	* gcc.dg/tree-ssa/scev-11.c: Ditto.
	* gcc.dg/tree-ssa/scev-9.c: Ditto.
	* gcc.dg/tree-ssa/split-path-11.c: Ditto.
	* gcc.dg/unroll-8.c: Ditto.
	* gcc.dg/var-expand1.c: Ditto.
	* gcc.dg/vect/vect-cost-model-6.c: Removed.
	* gcc.target/i386/pr86270.c: Ditto.
	* gcc.target/i386/pr86722.c: Ditto.
	* gcc.target/x86_64/abi/callabi/leaf-2.c: Ditto.
2024-10-10 07:18:38 +08:00
liuhongt
70c3db511b Enable vectorization for unknown tripcount in very cheap cost model but disable epilog vectorization.
gcc/ChangeLog:

	* tree-vect-loop.cc (vect_analyze_loop_costing): Enable
	vectorization for LOOP_VINFO_PEELING_FOR_NITER in very cheap
	cost model.
	(vect_analyze_loop): Disable epilogue vectorization in very
	cheap cost model.
	* doc/invoke.texi: Adjust documents for very-cheap cost model.
2024-10-10 07:18:38 +08:00
Jovan Vukic
c8957c8779 RISC-V: Optimize branches with shifted immediate operands
After the valuable feedback I received, it’s clear to me that the
oversight was in the tests showing the benefits of the patch. In the
test file, I added functions f5 and f6, which now generate more
efficient code with fewer instructions.

Before the patch:

f5:
        li      a4,2097152
        addi    a4,a4,-2048
        li      a5,1167360
        and     a0,a0,a4
        addi    a5,a5,-2048
        beq     a0,a5,.L4

f6:
        li      a5,3407872
        addi    a5,a5,-2048
        and     a0,a0,a5
        li      a5,1114112
        beq     a0,a5,.L7

After the patch:

f5:
        srli    a5,a0,11
        andi    a5,a5,1023
        li      a4,569
        beq     a5,a4,.L5

f6:
        srli    a5,a0,11
        andi    a5,a5,1663
        li      a4,544
        beq     a5,a4,.L9

	PR target/115921

gcc/ChangeLog:

	* config/riscv/iterators.md (any_eq): New code iterator.
	* config/riscv/riscv.h (COMMON_TRAILING_ZEROS): New macro.
	(SMALL_AFTER_COMMON_TRAILING_SHIFT): Ditto.
	* config/riscv/riscv.md (*branch<ANYI:mode>_shiftedarith_<optab>_shifted):
	New pattern.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/branch-1.c: Additional tests.
2024-10-09 16:53:38 -06:00
Jeff Law
df3bda457b Revert "RISC-V: Add implication for M extension."
This reverts commit 0a193466f2.
2024-10-09 16:22:06 -06:00
Jeff Law
e889235cb0 Revert "RISC-V: Enable builtin __riscv_mul with Zmmul extension."
This reverts commit 2990f5802a.
2024-10-09 16:21:56 -06:00
Eric Botcazou
7ac96b05cf Fix LTO bootstrap failure with -Werror=lto-type-mismatch
In GNAT's implementation model, using convention C (or C_Pass_By_Copy) has
no effect on the internal representation of types since the representation
is identical to that of C by default.  It's even counter-productive given
the implementation advice listed in B.3(63-71) so the interface between the
front-end and gigi does not use it and instead uses structurally identical
types on both sides.

gcc/ada
	PR ada/117038
	* fe.h (struct c_array): Add 'const' to declaration of pointer.
	(C_Source_Buffer): Use consistent formatting.
	* par-ch3.adb (P_Component_Items): Properly set Aliased_Present on
	access definition.
	* sinput.ads: Remove clause for Interfaces.C.
	(C_Array): Change type of Length to Integer and make both components
	aliased.  Remove Convention aspect.
	(C_Source_Buffer): Remove all aspects.
	* sinput.adb (C_Source_Buffer): Adjust to above change.
2024-10-09 23:44:23 +02:00
Eric Botcazou
820cd5266e Remove support for HP-UX 10
gcc/ada
	* Makefile.rtl: Remove HP-UX 10 section.
	* libgnarl/s-osinte__hpux-dce.ads: Delete.
	* libgnarl/s-osinte__hpux-dce.adb: Likewise.
	* libgnarl/s-taprop__hpux-dce.adb: Likewise.
	* libgnarl/s-taspri__hpux-dce.ads: Likewise.
	* libgnat/s-oslock__hpux-dce.ads: Likewise.
2024-10-09 23:44:22 +02:00
Jason Merrill
dcee0b6547 c++: more modules and -M
In r15-4119-gc877a27f04f648 I told preprocess_file to use the
directives-only scan with modules, but it seems that I also need to set the
cpp_option so that communication between _cpp_handle_directive and
scan_translation_unit_directives_only works properly in
c-c++-common/cpp/embed-6.c.

gcc/c-family/ChangeLog:

	* c-ppoutput.cc (preprocess_file): Set directives_only flag.
2024-10-09 17:28:35 -04:00
Jason Merrill
d264b75eb2 libcpp: fix typo
libcpp/ChangeLog:

	* macro.cc (_cpp_pop_context): Fix typo.
2024-10-09 17:28:35 -04:00
Torbjörn SVENSSON
08e91d71e5 testsuite: arm: use effective-target for mod* tests
This fixes a typo introduced in r15-4200-gcf08dd297ca that was reported
at https://linaro.atlassian.net/browse/GNU-1369.

gcc/testsuite/ChangeLog

	* gcc.target/arm/mod_2.c: Corrected effective-target to
	arm_cpu_cortex_a57_ok.
	* gcc.target/arm/mod_256.c: Likewise.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
2024-10-09 22:08:22 +02:00
Jonathan Wakely
4f97411c0d
libstdc++: Test 17_intro/names.cc with -D_FORTIFY_SOURCE=2 [PR116210]
Add a new testcase that repeats 17_intro/names.cc but with
_FORTIFY_SOURCE defined, to find problems in Glibc fortify wrappers like
https://sourceware.org/bugzilla/show_bug.cgi?id=32052 (which is fixed
now).

libstdc++-v3/ChangeLog:

	PR libstdc++/116210
	* testsuite/17_intro/names.cc (sz): Undef for versions of Glibc
	that use it in the fortify wrappers.
	* testsuite/17_intro/names_fortify.cc: New test.
2024-10-09 14:05:50 +01:00
Jonathan Wakely
5247ee086f
libstdc++: Drop format attribute from snprintf wrapper [PR116969]
When __LONG_DOUBLE_IEEE128__ is defined we need to declare a wrapper for
Glibc's 'snprintf' symbol, so we can call the original definition that
works with the IBM128 format of long double. Because we were declaring
the wrapper using __typeof__(__builtin_snprintf) it inherited the
__attribute__((format(printf, 3, 4))) decoration, and then we got a
warning for calling that wrapper with an __ibm128 argument for a %Lf
conversion specifier. The warning is bogus, because the function we're
calling really does want __ibm128 for %Lf, but there's no "printf but
with a different long double format" archetype for the attribute.

In r15-4039-g28911f626864e7 I added a diagnostic pragma to suppress the
warning, but it would be better to just declare the wrapper without the
attribute, and not have to suppress a warning for code that we know is
actually correct.

libstdc++-v3/ChangeLog:

	PR libstdc++/116969
	* include/bits/locale_facets_nonio.tcc (money_put::__do_put):
	Remove diagnostic pragmas.
	(__glibcxx_snprintfibm128): Declare type manually, instead of
	using __typeof__(__builtin_snprintf).
2024-10-09 14:05:49 +01:00
Frank Scheiner
c0bc9a153a
libstdc++: Workaround glibc headers on ia64-linux
We see:

```
FAIL: 17_intro/names.cc  -std=gnu++17 (test for excess errors)
FAIL: 17_intro/names_pstl.cc  -std=gnu++17 (test for excess errors)
FAIL: experimental/names.cc  -std=gnu++17 (test for excess errors)
```

...on ia64-linux.

This is due to:

* /usr/include/bits/sigcontext.h:32-38:
```
32 struct __ia64_fpreg
33   {
34     union
35       {
36         unsigned long bits[2];
37       } u;
38   } __attribute__ ((__aligned__ (16)));
```

* /usr/include/sys/ucontext.h:39-45:
```
  39 struct __ia64_fpreg_mcontext
  40   {
  41     union
  42       {
  43         unsigned long __ctx(bits)[2];
  44       } __ctx(u);
  45   } __attribute__ ((__aligned__ (16)));
```

...from glibc 2.39 (w/ia64 support re-added). See the discussion
starting on [1].

[1]: https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654487.html

Signed-off-by: Frank Scheiner <frank.scheiner@web.de>

libstdc++-v3/ChangeLog:

	* testsuite/17_intro/names.cc [__linux__ && __ia64__]: Undefine
	'u' as used in glibc headers.
2024-10-09 14:05:49 +01:00
Richard Sandiford
fee3adbac0 aarch64: Fix SVE ACLE gimple folds for C++ LTO [PR116629]
The SVE ACLE code has two ways of handling overloaded functions.
One, used by C, is to define a single dummy function for each unique
overloaded name, with resolve_overloaded_builtin then resolving calls
to real non-overloaded functions.  The other, used by C++, is to
define a separate function for each individual overload.

The builtins harness assigns integer function codes programmatically.
However, LTO requires it to use the same assignment for every
translation unit, regardless of language.  This means that C++ TUs
need to create (unused) slots for the C overloads and that C TUs
need to create (unused) slots for the C++ overloads.

In many ways, it doesn't matter whether the LTO frontend itself
uses the C approach or the C++ approach to defining overloaded
functions, since the LTO frontend never has to resolve source-level
overloading.  However, the C++ approach of defining a separate
function for each overload means that C++ calls never need to
be redirected to a different function.  Calls to an overload
can appear in the LTO dump and survive until expand.  In contrast,
calls to C's dummy overload functions are resolved by the front
end and never survive to LTO (or expand).

Some optimisations work by moving between sibling functions, such as _m
to _x.  If the source function is an overload, the expected destination
function is too.  The LTO frontend needs to define C++ overloads if it
wants to do this optimisation properly for C++.

The PR is about a tree checking failure caused by trying to use a
stubbed-out C++ overload in LTO.  Dealing with that by detecting the
stub (rather than changing which overloads are defined) would have
turned this from an ice-on-valid to a missed optimisation.

In future, it would probably make sense to redirect overloads to
non-overloaded functions during gimple folding, in case that exposes
more CSE opportunities.  But it'd probably be of limited benefit, since
it should be rare for code to mix overloaded and non-overloaded uses of
the same operation.  It also wouldn't be suitable for backports.

gcc/
	PR target/116629
	* config/aarch64/aarch64-sve-builtins.cc
	(function_builder::function_builder): Use direct overloads for LTO.

gcc/testsuite/
	PR target/116629
	* gcc.target/aarch64/sve/acle/general/pr106326_2.c: New test.
2024-10-09 13:57:36 +01:00
Richard Sandiford
b94331d9a3 testsuite: Make check-function-bodies work with LTO
This patch tries to make check-function-bodies automatically
choose between reading the regular assembly file and reading the
LTO assembly file.  There should only ever be one right answer,
since check-function-bodies doesn't make sense on slim LTO output.

Maybe this will turn out to be impossible to get right, but I'd like
to try at least.

gcc/testsuite/
	* lib/scanasm.exp (check-function-bodies): Look in ltrans0.ltrans.s
	if the test appears to be using LTO.
2024-10-09 13:57:36 +01:00
Jonathan Wakely
9a5ac633f0
libstdc++: Ignore _GLIBCXX_USE_POSIX_SEMAPHORE if not supported [PR116992]
If _GLIBCXX_HAVE_POSIX_SEMAPHRE is undefined then users get an error
when defining _GLIBCXX_USE_POSIX_SEMAPHORE. We can just ignore it
instead (and warn them it's being ignored).

This fixes a testsuite failure on hppa64-hp-hpux11.11 (and probably some
other targets):

FAIL: 30_threads/semaphore/platform_try_acquire_for.cc  -std=gnu++20 (test for excess errors)
Excess errors:
semaphore:49: error: '__semaphore_impl' has not been declared

libstdc++-v3/ChangeLog:

	PR libstdc++/116992
	* include/bits/semaphore_base.h (_GLIBCXX_USE_POSIX_SEMAPHORE):
	Undefine and issue a warning if POSIX sem_t is not supported.
	* testsuite/30_threads/semaphore/platform_try_acquire_for.cc:
	Prune new warning.
2024-10-09 13:41:06 +01:00
Jonathan Wakely
e998014d1b
libstdc++: Fix -Wnarrowing in <complex> [PR116991]
When _GLIBCXX_USE_C99_COMPLEX_ARC is undefined we use the generic
__complex_acos function template for _Float32 etc. and that gives a
-Wnarrowing warning:

complex:2043: warning: ISO C++ does not allow converting to '_Float32' from 'long double' with greater conversion rank [-Wnarrowing]

Use a cast to do the conversion so that it doesn't warn.

libstdc++-v3/ChangeLog:

	PR libstdc++/116991
	* include/std/complex (__complex_acos): Cast literal to
	destination type.
2024-10-09 13:41:06 +01:00
Jonathan Wakely
f5021ce9aa
libstdc++: Fix -Wsign-compare in std::latch::count_down
Also add assertions for the precondition on the parameter's value.

libstdc++-v3/ChangeLog:

	* include/std/latch (latch::count_down): Add assertions for
	preconditions. Cast parameter to avoid -Wsign-compare on some
	targets.
2024-10-09 13:41:06 +01:00
Jonathan Wakely
361d230fd7
libstdc++: Enable _GLIBCXX_ASSERTIONS by default for -O0 [PR112808]
Too many users don't know about -D_GLIBCXX_ASSERTIONS and so are missing
valuable checks for C++ standard library preconditions. This change
enables libstdc++ assertions by default when compiling with -O0 so that
we diagnose more bugs by default.

When users enable optimization we don't add the assertions by default
(because they have non-zero overhead) so they still need to enable them
manually.

For users who really don't want the assertions even in unoptimized
builds, defining _GLIBCXX_NO_ASSERTIONS will prevent them from being
enabled automatically.

libstdc++-v3/ChangeLog:

	PR libstdc++/112808
	* doc/xml/manual/using.xml (_GLIBCXX_ASSERTIONS): Document
	implicit definition for -O0 compilation.
	(_GLIBCXX_NO_ASSERTIONS): Document.
	* doc/html/manual/using_macros.html: Regenerate.
	* include/bits/c++config [!__OPTIMIZE__] (_GLIBCXX_ASSERTIONS):
	Define for unoptimized builds.
2024-10-09 13:39:16 +01:00
Jonathan Wakely
6ce1df379a
libstdc++: Simplify std::aligned_storage and fix for versioned namespace [PR61458]
This simplifies the implementation of std::aligned_storage. For the
unstable ABI it also fixes the bug where its size is too large when the
default alignment is used. We can't fix that for the stable ABI though,
so just add a comment about the bug.

libstdc++-v3/ChangeLog:

	PR libstdc++/61458
	* doc/doxygen/user.cfg.in (GENERATE_BUGLIST): Set to NO.
	* include/std/type_traits (__aligned_storage_msa): Remove.
	(__aligned_storage_max_align_t): New struct.
	(__aligned_storage_default_alignment): New function.
	(aligned_storage): Use __aligned_storage_default_alignment for
	default alignment. Replace union with a struct containing an
	aligned buffer. Improve Doxygen comment.
	(aligned_storage_t): Use __aligned_storage_default_alignment for
	default alignment.
2024-10-09 13:39:16 +01:00
Jonathan Wakely
2eaae1bd69
libstdc++: Do not cast away const-ness in std::construct_at (LWG 3870)
This change also requires implementing the proposed resolution of LWG
3216 so that std::make_shared and std::allocate_shared still work, and
the proposed resolution of LWG 3891 so that std::expected still works.

libstdc++-v3/ChangeLog:

	* include/bits/shared_ptr_base.h: Remove cv-qualifiers from
	type managed by _Sp_counted_ptr_inplace, as per LWG 3210.
	* include/bits/stl_construct.h: Do not cast away cv-qualifiers
	when passing pointer to placement new.
	* include/std/expected: Use remove_cv_t for union member, as per
	LWG 3891.
	* testsuite/20_util/allocator/void.cc: Do not test construction
	via const pointer.
2024-10-09 13:39:15 +01:00
Jonathan Wakely
993deb3a9a
libstdc++: Make std::construct_at support arrays (LWG 3436)
The issue was approved at the recent St. Louis meeting, requiring
support for bounded arrays, but only without arguments to initialize the
array elements.

libstdc++-v3/ChangeLog:

	* include/bits/stl_construct.h (construct_at): Support array
	types (LWG 3436).
	* testsuite/20_util/specialized_algorithms/construct_at/array.cc:
	New test.
	* testsuite/20_util/specialized_algorithms/construct_at/array_neg.cc:
	New test.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp0x/initlist-opt1.C: Adjust for different diagnostics
	from std::construct_at by adding -fconcepts-diagnostics-depth=2.
2024-10-09 13:39:15 +01:00
Jonathan Wakely
ce89d2f317
libstdc++: Tweak %c formatting for chrono types
libstdc++-v3/ChangeLog:

	* include/bits/chrono_io.h (__formatter_chrono::_M_c): Add
	[[unlikely]] attribute to condition for missing %c format in
	locale. Use %T instead of %H:%M:%S in fallback.
2024-10-09 13:39:15 +01:00
Jonathan Wakely
b349c651ff
libstdc++: Fix formatting of chrono::duration with character rep [PR116755]
Implement Peter Dimov's suggestion for resolving LWG 4118, which is to
use +d.count() so that character types are promoted to an integer type
before formatting them. This didn't have unanimous consensus in the
committee as Howard Hinnant proposed that we should format the rep
consistently with std::format("{}", d.count()) instead. That ends up
being more complicated, because it makes std::formattable a precondition
of operator<< which was not previously the case, and it means that
ios_base::fmtflags from the stream would be ignored because std::format
doesn't use them.

libstdc++-v3/ChangeLog:

	PR libstdc++/116755
	* include/bits/chrono_io.h (operator<<): Use +d.count() for
	duration inserter.
	(__formatter_chrono::_M_format): Likewise for %Q format.
	* testsuite/20_util/duration/io.cc: Test durations with
	character types as reps.
2024-10-09 13:39:15 +01:00
Richard Biener
55dbb4b526 Clear DR_GROUP_NEXT_ELEMENT upon group dissolving
I've tried to sanitize DR_GROUP_NEXT_ELEMENT accesses but there are too
many so the following instead makes sure DR_GROUP_NEXT_ELEMENT is never
non-NULL for !STMT_VINFO_GROUPED_ACCESS.

	* tree-vect-data-refs.cc (vect_analyze_data_ref_access): When
	cancelling a DR group also clear DR_GROUP_NEXT_ELEMENT.
2024-10-09 14:37:26 +02:00
Richard Biener
72c83f644d tree-optimization/117041 - fix load classification of former grouped load
When we first detect a grouped load but later dis-associate it we
only set DR_GROUP_FIRST_ELEMENT to NULL, indicating it is not a
STMT_VINFO_GROUPED_ACCESS but leave DR_GROUP_NEXT_ELEMENT set.  This
causes a stray DR_GROUP_NEXT_ELEMENT access in get_group_load_store_type
to go wrong, indicating a load isn't single_element_p when it actually
is, leading to wrong classification and an ICE.

	PR tree-optimization/117041
	* tree-vect-stmts.cc (get_group_load_store_type): Only
	check DR_GROUP_NEXT_ELEMENT for STMT_VINFO_GROUPED_ACCESS.

	* gcc.dg/torture/pr117041.c: New testcase.
2024-10-09 14:37:25 +02:00
Torbjörn SVENSSON
cf08dd297c testsuite: arm: use effective-target for vsel*, mod* and pr65647.c tests
Update test cases to use -mcpu=unset/-march=unset feature introduced in
r15-3606-g7d6c6a0d15c.

gcc/testsuite/ChangeLog

	* gcc.target/arm/pr65647.c: Use effective-target arm_arch_v6m.
	Removed unneeded dg-skip-if.
	* gcc.target/arm/mod_2.c: Use effective-target arm_cpu_cortex_a57.
	* gcc.target/arm/mod_256.c: Likewise.
	* gcc.target/arm/vseleqdf.c: Likewise.
	* gcc.target/arm/vseleqsf.c: Likewise.
	* gcc.target/arm/vselgedf.c: Likewise.
	* gcc.target/arm/vselgesf.c: Likewise.
	* gcc.target/arm/vselgtdf.c: Likewise.
	* gcc.target/arm/vselgtsf.c: Likewise.
	* gcc.target/arm/vselledf.c: Likewise.
	* gcc.target/arm/vsellesf.c: Likewise.
	* gcc.target/arm/vselltdf.c: Likewise.
	* gcc.target/arm/vselltsf.c: Likewise.
	* gcc.target/arm/vselnedf.c: Likewise.
	* gcc.target/arm/vselnesf.c: Likewise.
	* gcc.target/arm/vselvcdf.c: Likewise.
	* gcc.target/arm/vselvcsf.c: Likewise.
	* gcc.target/arm/vselvsdf.c: Likewise.
	* gcc.target/arm/vselvssf.c: Likewise.
	* lib/target-supports.exp: Define effective-target arm_cpu_cortex_a57.
	Update effective-target arm_v8_1_lob_ok to use -mcpu=unset.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
2024-10-09 13:47:08 +02:00
Ken Matsui
f709990333
libcpp: Use ' instead of %< and %> [PR117039]
PR bootstrap/117039

libcpp/ChangeLog:

	* directives.cc (do_pragma_once): Use ' instead of %< and %>.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
2024-10-09 07:33:53 -04:00
René Rebe
68afc7acf6 Enable LRA for ia64
This was tested by bootstrapping GCC natively on ia64-t2-linux-gnu and
running the testsuite (based on
2361160681):

https://gcc.gnu.org/pipermail/gcc-testresults/2024-June/817268.html

For comparison, the same with just
2361160681:

https://gcc.gnu.org/pipermail/gcc-testresults/2024-June/817267.html

gcc/
	* config/ia64/ia64.cc: Enable LRA for ia64.
	* config/ia64/ia64.md: Likewise.
	* config/ia64/predicates.md: Likewise.

Signed-off-by: René Rebe <rene@exactcode.de>
2024-10-09 11:28:33 +02:00
René Rebe
452b12cea8 Remove ia64*-*-linux from the list of obsolete targets
The following un-deprecates ia64*-*-linux for GCC 15. Since we plan to
support this for some years to come.

gcc/
	* config.gcc: Only list ia64*-*-(hpux|vms|elf) in the list of
	obsoleted targets.

contrib/
	* config-list.mk (LIST): no --enable-obsolete for ia64-linux.

Signed-off-by: René Rebe <rene@exactcode.de>
2024-10-09 11:28:20 +02:00
Richard Biener
9df0772d50 tree-optimization/116974 - Handle single-lane SLP for OMP scan store
The following massages the GIMPLE matching way of handling scan
stores to work with single-lane SLP.  I do not fully understand all
the cases that can happen and the stmt matching at vectorizable_store
time is less than ideal - but the following gets me all the testcases
to pass with and without forced SLP.

Long term we want to perform the matching at SLP discovery time,
properly chaining the various SLP instances the current state ends
up with.

	PR tree-optimization/116974
	* tree-vect-stmts.cc (check_scan_store): Pass in the SLP node
	instead of just a flag.  Allow single-lane scan stores.
	(vectorizable_store): Adjust.
	* tree-vect-loop.cc (vect_analyze_loop_2): Empty scan_map
	before re-trying.
2024-10-09 09:56:20 +02:00
Richard Biener
dc90578f0b tree-optimization/116575 - handle SLP of permuted masked loads
The following handles SLP discovery of permuted masked loads which
was prohibited (because wrongly handled) for PR114375.  In particular
with single-lane SLP at the moment all masked group loads appear
permuted and we fail to use masked load lanes as well.  The following
addresses parts of the issues, starting with doing correct basic
discovery - namely discover an unpermuted mask load followed by
a permute node.  In particular groups with gaps do not support masking
yet (and didn't before w/o SLP IIRC).  There's still issues with
how we represent masked load/store-lanes I think, but I first have to
get my hands on a good testcase.

	PR tree-optimization/116575
	PR tree-optimization/114375
	* tree-vect-slp.cc (vect_build_slp_tree_2): Do not reject
	permuted mask loads without gaps but instead discover a
	node for the full unpermuted load and permute that with
	a VEC_PERM node.

	* gcc.dg/vect/vect-pr114375.c: Expect vectorization now with avx2.
2024-10-09 09:54:42 +02:00