When compare_tests compares both C and C++ tests in c-c++-common, they
get the same identifier, so expected differences in results across
languages become undesirably noisy.
This patch adds tool identifiers to tests, so that runs by different
tools are not confused by the compare logic.
It also fixes a bug in reporting differences, that would attempt to
print an undefined fname (the definitions are in subshell loops), and
adjusts the target insertion to match tabs in addition to blanks after
colons.
for contrib/ChangeLog
* compare_tests: Add tool to test lines. Match tabs besides
blanks to insert tool and target. Don't print undefined fname.
Currently the debug counter sched_block doesn't work well
since we create dependencies for some insns and those
dependencies are expected to be resolved during scheduling
insns but they can get skipped once we are skipping some
block while respecting sched_block debug counter.
For example, for the below test case:
--
int a, b, c, e, f;
float d;
void
g ()
{
float h, i[1];
for (; f;)
if (c)
{
d *e;
if (b)
{
float *j = i;
j[0] = 0;
}
h = d;
}
if (h)
a = i[0];
}
--
ICE occurs with option "-O2 -fdbg-cnt=sched_block:1".
As the discussion in [1], it seems that we think this debug
counter is useless and can be removed. It's also implied
that if it's useful and used often, the above issue should
have been cared about and resolved earlier. So this patch
is to remove this debug counter.
[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635852.html
gcc/ChangeLog:
* dbgcnt.def (sched_block): Remove.
* sched-rgn.cc (schedule_region): Remove the support of debug count
sched_block.
PR37722 complains that a computed goto doesn't destroy objects that go out
of scope. In the PR we agreed that it never will, but we can warn about
it.
PR c++/37722
gcc/ChangeLog:
* doc/extend.texi: Document that computed goto does not
call destructors.
gcc/cp/ChangeLog:
* decl.cc (identify_goto): Adjust for computed goto.
(struct named_label_use_entry): Add computed_goto field.
(poplevel_named_label_1): Add to computed_goto vec.
(check_previous_goto_1): Dump computed_goto vec.
(check_goto_1): Split out from check_goto.
(check_goto): Check all addressable labels for computed goto.
(struct named_label_entry): Add addressed field.
(mark_label_addressed): New.
* parser.cc (cp_parser_unary_expression): Call it.
* cp-tree.h (mark_label_addressed): Declare it.
gcc/testsuite/ChangeLog:
* g++.dg/ext/label15.C: New test.
* g++.dg/torture/pr42739.C: Expect warning.
-Werror=foo implying -Wfoo wasn't working for -Wdeprecated-copy-dtor,
because it is specified as the value 2 of warn_deprecated_copy, which shows
up as CLVC_EQUAL, which is not one of the three var_types handled by
control_warning_option. It seems to me that we can just unconditionally
handle_generated_option, and only have special argument handling for those
types.
PR c++/106213
gcc/ChangeLog:
* opts-common.cc (control_warning_option): Call
handle_generated_option for all cl_var_types.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/depr-copy5.C: New test.
I thought it could be easier to use check_GNU_style.py. With this alias,
'git gcc-style' will take a git revision as argument instead of a file, or
check HEAD if no argument is given.
contrib/ChangeLog:
* gcc-git-customization.sh: Add git gcc-style alias.
While trying to fix bugs of PR113097, notice this following situation we
generate redundant vsetvli
_255 = SELECT_VL (3, POLY_INT_CST [4, 4]);
COND_LEN (..., _255)
Before this patch:
vsetivli a5, 3...
...
vadd.vv (use a5)
After this patch:
...
vadd.vv (use AVL = 3)
The reason we can do this is because known_ge (3, [4,4]) is true.
It's safe to apply such optimization
Tested on both RV32 and RV64 full coverage testing, no regression.
PR target/113087
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_select_vl): Optimize SELECT_VL.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr113087-2.c: New test.
This patch fixes bugs in the fusion of this following case:
li a5,-1
vmv.s.x v0,a5 -> demand any non-zero AVL
vsetvli a5, ...
Incorrect fusion after VSETVL PASS:
li a5,-1
vsetvli a5...
vmv.s.x v0, a5 --> a5 is modified as incorrect value.
We disallow this incorrect fusion above.
Full coverage testing of RV64 and RV32 no regression.
PR target/113087
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc: Disallow fusion when VL modification pollutes non AVL use.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr113087-1.c: New test.
This patch works around behaviour of the 2D and 3D memcpy operations in
the CUDA driver runtime. Particularly in Fortran, the "base pointer"
of an array (used for either source or destination of a host/device copy)
may lie outside of data that is actually stored on the device. The fix
is to make sure that we use the first element of data to be transferred
instead, and adjust parameters accordingly.
2023-10-02 Julian Brown <julian@codesourcery.com>
libgomp/
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_memcpy2d): Adjust parameters to
avoid out-of-bounds array checks in CUDA runtime.
(GOMP_OFFLOAD_memcpy3d): Likewise.
* testsuite/libgomp.c-c++-common/memcpyxd-bias-1.c: New test.
gcc/ChangeLog:
* doc/invoke.texi: Document the new file extensions
gcc/fortran/ChangeLog:
PR fortran/81615
* lang-specs.h (F951_CPP_OPTIONS): Do not hardcode ".f90" extension
(F951_CPP_EXTENSION): Use .fi/.fii for fixed/free form sources
* options.cc (form_from_filename): Handle the new extensions
Signed-off-by: Rimvydas Jasinskas <rimvydas.jas@gmail.com>
If cse sees:
(set (reg R) (const_vector [A B ...]))
it creates fake sets of the form:
(set R[0] A)
(set R[1] B)
...
(with R[n] replaced by appropriate rtl) and then adds them to the tables
in the same way as for normal sets. This allows a sequence like:
(set (reg R2) A)
...(reg R2)...
to try to use R[0] instead of (reg R2).
But the pass was taking the analogy too far, and was trying to simplify
these fake sets based on costs. That is, if there was an earlier:
(set (reg T) A)
the pass would go to considerable effort trying to work out whether:
(set R[0] A)
or:
(set R[0] (reg T))
was more profitable. This included running validate*_change on the sets,
which has no meaning given that the sets are not part of the insn.
In this example, the equivalence A == T is already known, and the
purpose of the fake sets is to add A == T == R[0]. We can do that
just as easily (or, as the PR shows, more easily) if we keep the
original form of the fake set, with A instead of T.
The problem in the PR occurred if we had:
(1) something that establishes an equivalence between a vector V1 of
M-bit scalar integers and a hard register H
(2) something that establishes an equivalence between a vector V2 of
N-bit scalar integers, where N<M and where V2 contains at least 2
instances of V1[0]
(1) established an equivalence between V1[0] and H in M bits.
(2) then triggered a search for an equivalence of V1[0] in N bits.
This included:
/* See if we have a CONST_INT that is already in a register in a
wider mode. */
which (correctly) found that the low N bits of H contain the right value.
But because it came from a wider mode, this equivalence between N-bit H
and N-bit V1[0] was not yet in the hash table. It therefore survived
the purge in:
/* At this point, ELT, if nonzero, points to a class of expressions
equivalent to the source of this SET and SRC, SRC_EQV, SRC_FOLDED,
and SRC_RELATED, if nonzero, each contain additional equivalent
expressions. Prune these latter expressions by deleting expressions
already in the equivalence class.
And since more than 1 set found the same N-bit equivalence between
H and V1[0], the pass tried to add it more than once.
Things were already wrong at this stage, but an ICE was only triggered
later when trying to merge this N-bit equivalence with another one.
We could avoid the double registration by adding:
for (elt = classp; elt; elt = elt->next_same_value)
if (rtx_equal_p (elt->exp, x))
return elt;
to insert_with_costs, or by making cse_insn check whether previous
sets have recorded the same equivalence. The latter seems more
appealing from a compile-time perspective. But in this case,
doing that would be adding yet more spurious work to the handling
of fake sets.
The handling of fake sets therefore seems like the more fundamental bug.
While there, the patch also makes sure that we don't apply REG_EQUAL
notes to these fake sets. They only describe the "real" (first) set.
gcc/
PR rtl-optimization/111702
* cse.cc (set::mode): Move earlier.
(set::src_in_memory, set::src_volatile): Convert to bitfields.
(set::is_fake_set): New member variable.
(add_to_set): Add an is_fake_set parameter.
(find_sets_in_insn): Update calls accordingly.
(cse_insn): Do not apply REG_EQUAL notes to fake sets. Do not
try to optimize them either, or validate changes to them.
gcc/testsuite/
PR rtl-optimization/111702
* gcc.dg/rtl/aarch64/pr111702.c: New test.
maybe_splice_retval_cleanup assumed that the function body can't be empty if
there's a throwing cleanup, but when I added cleanups to try blocks in
r12-6333-gb10e031458d541 I didn't adjust that assumption.
PR c++/113088
PR c++/33799
gcc/cp/ChangeLog:
* except.cc (maybe_splice_retval_cleanup): Handle an empty block.
gcc/testsuite/ChangeLog:
* g++.dg/eh/return2.C: New test.
Normally we handle xvalue array subscripting with ARRAY_REF, but in this
case we weren't doing that because the operands were reversed. Handle that
case better.
PR c++/103185
gcc/cp/ChangeLog:
* typeck.cc (cp_build_array_ref): Handle swapped operands.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1z/array-prvalue2.C: New test.
* g++.dg/cpp1z/eval-order3.C: Test swapped operands.
This patch addresses the issue reported in PR target/112787 by improving the
compute type selection. We do this by not considering types with more elements
than the type we are lowering since we'd reject such types anyway.
gcc/ChangeLog:
PR target/112787
* tree-vect-generic.cc (type_for_widest_vector_mode): Change function to
use original vector type and check widest vector mode has at most the
same number of elements.
(get_compute_type): Pass original vector type rather than the element
type to type_for_widest_vector_mode and remove now obsolete check for
the number of elements.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/pr112787.c: New test.
This matches other compiler diagnostics. No test updates are needed
because c-c++-common/pr95378.c does not match a specific -W option.
Fixes commit d2384b7b24 ("c-family:
check qualifiers of arguments to __atomic built-ins (PR 95378)").
gcc/c-family/
PR c/113050
* c-common.cc (get_atomic_generic_size): Use
OPT_Wdiscarded_qualifiers instead of
OPT_Wincompatible_pointer_types.
Narrow down scope of the unknowns bitmap so that it is only accessible
within the reexamination process. This also removes any role of unknown
propagation from object_sizes_set, thus simplifying that code path a
bit.
gcc/ChangeLog:
* tree-object-size.cc (object_size_info): Remove UNKNOWNS.
Drop all references to it.
(object_sizes_set): Move unknowns propagation code to...
(gimplify_size_expressions): ... here. Also free reexamine
bitmap.
(propagate_unknowns): New parameter UNKNOWNS. Update callers.
Signed-off-by: Siddhesh Poyarekar <siddhesh@gotplt.org>
Like commit d94fae044d "No libstdc++ for nvptx"
(2015) and elsewhere.
Based on commit 5f1bed2a7a (2023-12-16), there
are a ton of progressions (for test cases not actually depending on libstdc++
symbols, obviously):
=== g++ Summary ===
# of expected passes [-178369-]{+189226+}
# of unexpected failures [-19880-]{+14089+}
# of unexpected successes 14
# of expected failures [-1684-]{+1685+}
# of unresolved testcases [-9820-]{+4837+}
# of unsupported tests [-11971-]{+11968+}
..., and only two benign "regressions":
[-UNSUPPORTED:-]{+FAIL:+} g++.dg/init/array54.C -std=c++14 {+(test for excess errors)+}
{+UNRESOLVED: g++.dg/init/array54.C -std=c++14 compilation failed to produce executable+}
[Etc.]
[...]/g++.dg/init/array54.C:5:10: fatal error: atomic: No such file or directory
That's similar to a lof of other test cases intending to '#include' standard
C++/libstdc++ headers; to be addressed in due time.
PASS: g++.old-deja/g++.pt/const2.C -std=c++98 at line 5 (test for warnings, line )
[-PASS:-]{+FAIL:+} g++.old-deja/g++.pt/const2.C -std=c++98 (test for excess errors)
[Etc.]
ld: error: undefined symbol: A<int>::i
>>> referenced by /tmp/ccqXWCSh.o:(p)
The 'error: undefined symbol' is expected here; maybe should simply in the test
case 'dg-prune-output "referenced by"'? (This PASSed before, as the
'dg-message "i"' was satisfied by 'ld: error: unable to find library -lstdc++',
eh...)
gcc/
* config/gcn/gcn.h (LIBSTDCXX): Define to "gcc".
gcc.dg/vect/bb-slp-pr78205.c is reported to have regressed with
the PR113073 change and in the end it's due to the DCE performed
by vect_transform_slp_perm_load_1 being imperfect. The following
enhances it to also cover the CTOR and VIEW_CONVERT operations that
might be involved.
* tree-vect-slp.cc (vect_transform_slp_perm_load_1): Also handle
CTOR and VIEW_CONVERT up to the load when performing chain DCE.
Non functional change, clean up the code.
gcc/ChangeLog:
* config/loongarch/loongarch.cc
(loongarch_expand_vector_init_same): Remove "temp2" and reuse
"temp" instead.
(loongarch_expand_vector_init): Use gcc_unreachable () instead
of gcc_assert (0), and fix the comment for it.
Jakub says:
Then that seems like a bug in the loongarch vec_init pattern(s).
Those really don't have a predicate in any of the backends on the
input operand, so they need to force_reg it if it is something it
can't handle. I've looked e.g. at i386 vec_init and that is exactly
what it does, see the various tests + force_reg calls in
ix86_expand_vector_init*.
So replace gen_reg_rtx + emit_move_insn with force_reg to fix PR 113033.
gcc/ChangeLog:
PR target/113033
* config/loongarch/loongarch.cc
(loongarch_expand_vector_init_same): Replace gen_reg_rtx +
emit_move_insn with force_reg.
(loongarch_expand_vector_init): Likewise.
gcc/testsuite/ChangeLog:
PR target/113033
* gcc.target/loongarch/pr113033.c: New test.
We had the following mappings between <x>vfcmp submenmonics and RTX
codes:
(define_code_attr fcc
[(unordered "cun")
(ordered "cor")
(eq "ceq")
(ne "cne")
(uneq "cueq")
(unle "cule")
(unlt "cult")
(le "cle")
(lt "clt")])
This is inconsistent with scalar code:
(define_code_attr fcond [(unordered "cun")
(uneq "cueq")
(unlt "cult")
(unle "cule")
(eq "ceq")
(lt "slt")
(le "sle")
(ordered "cor")
(ltgt "sne")
(ne "cune")
(ge "sge")
(gt "sgt")
(unge "cuge")
(ungt "cugt")])
For every RTX code for which the LSX/LASX code is different from the
scalar code, the scalar code is correct and the LSX/LASX code is wrong.
Most seriously, the RTX code NE should be mapped to "cneq", not "cne".
Rewrite <x>vfcmp define_insns in simd.md using the same mapping as
scalar fcmp.
Note that GAS does not support [x]vfcmp.{c/s}[u]{ge/gt} (pseudo)
instruction (although fcmp.{c/s}[u]{ge/gt} is supported), so we need to
switch the order of inputs and use [x]vfcmp.{c/s}[u]{le/lt} instead.
The <x>vfcmp.{sult/sule/clt/cle}.{s/d} instructions do not have a single
RTX code, but they can be modeled as an inversed RTX code following a
"not" operation. Doing so allows the compiler to optimized vectorized
__builtin_isless etc. to a single instruction. This optimization should
be added for scalar code too and I'll do it later.
Tests are added for mapping between C code, IEC 60559 operations, and
vfcmp instructions.
[1]:https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640713.html
gcc/ChangeLog:
PR target/113034
* config/loongarch/lasx.md (UNSPEC_LASX_XVFCMP_*): Remove.
(lasx_xvfcmp_caf_<flasxfmt>): Remove.
(lasx_xvfcmp_cune_<FLASX:flasxfmt>): Remove.
(FSC256_UNS): Remove.
(fsc256): Remove.
(lasx_xvfcmp_<vfcond:fcc>_<FLASX:flasxfmt>): Remove.
(lasx_xvfcmp_<fsc256>_<FLASX:flasxfmt>): Remove.
* config/loongarch/lsx.md (UNSPEC_LSX_XVFCMP_*): Remove.
(lsx_vfcmp_caf_<flsxfmt>): Remove.
(lsx_vfcmp_cune_<FLSX:flsxfmt>): Remove.
(vfcond): Remove.
(fcc): Remove.
(FSC_UNS): Remove.
(fsc): Remove.
(lsx_vfcmp_<vfcond:fcc>_<FLSX:flsxfmt>): Remove.
(lsx_vfcmp_<fsc>_<FLSX:flsxfmt>): Remove.
* config/loongarch/simd.md
(fcond_simd): New define_code_iterator.
(<simd_isa>_<x>vfcmp_<fcond:fcond_simd>_<simdfmt>):
New define_insn.
(fcond_simd_rev): New define_code_iterator.
(fcond_rev_asm): New define_code_attr.
(<simd_isa>_<x>vfcmp_<fcond:fcond_simd_rev>_<simdfmt>):
New define_insn.
(fcond_inv): New define_code_iterator.
(fcond_inv_rev): New define_code_iterator.
(fcond_inv_rev_asm): New define_code_attr.
(<simd_isa>_<x>vfcmp_<fcond_inv>_<simdfmt>): New define_insn.
(<simd_isa>_<x>vfcmp_<fcond_inv:fcond_inv_rev>_<simdfmt>):
New define_insn.
(UNSPEC_SIMD_FCMP_CAF, UNSPEC_SIMD_FCMP_SAF,
UNSPEC_SIMD_FCMP_SEQ, UNSPEC_SIMD_FCMP_SUN,
UNSPEC_SIMD_FCMP_SUEQ, UNSPEC_SIMD_FCMP_CNE,
UNSPEC_SIMD_FCMP_SOR, UNSPEC_SIMD_FCMP_SUNE): New unspecs.
(SIMD_FCMP): New define_int_iterator.
(fcond_unspec): New define_int_attr.
(<simd_isa>_<x>vfcmp_<fcond_unspec>_<simdfmt>): New define_insn.
* config/loongarch/loongarch.cc (loongarch_expand_lsx_cmp):
Remove unneeded special cases.
gcc/testsuite/ChangeLog:
PR target/113034
* gcc.target/loongarch/vfcmp-f.c: New test.
* gcc.target/loongarch/vfcmp-d.c: New test.
* gcc.target/loongarch/xvfcmp-f.c: New test.
* gcc.target/loongarch/xvfcmp-d.c: New test.
* gcc.target/loongarch/vector/lasx/lasx-vcond-2.c: Scan for cune
instead of cne.
* gcc.target/loongarch/vector/lsx/lsx-vcond-2.c: Likewise.
For the stmt _1 = _2 + _3, assume that _2 or _3 not used after this stmt.
_1 can use same register with _2 or _3 if without early clobber.
Two registers are needed, but current calculation is three.
This patch preserves point 0 for bb entry and excludes its def when
calculates live regs of certain point.
Signed-off-by: demin.han <demin.han@starfivetech.com>
gcc/ChangeLog:
* config/riscv/riscv-vector-costs.cc (max_number_of_live_regs): Fix
max live vregs calc
(preferred_new_lmul_p): Ditto
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: Moved to...
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c: ...here.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c: Moved to...
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-11.c: ...here.
Signed-off-by: demin.han <demin.han@starfivetech.com>
The following patch makes most of x86 MD builtins nothrow,leaf
(like most middle-end builtins are). For -fnon-call-exceptions it
doesn't nothrow, better might be to still add it if the builtins
don't read or write memory and can't raise floating point exceptions,
but we don't have such information readily available, so the patch
uses just !flag_non_call_exceptions for now.
Not sure if we shouldn't have some exceptions for the leaf attribute,
e.g. wonder about EMMS/FEMMS and the various xsave/xrstor etc. builtins,
pedantically none of those builtins do anything that leaf functions
are forbidden to do (having callbacks, calling functions from current TU,
longjump into the current TU), but sometimes non-leaf is also used on
really complex functions to prevent some unwanted optimizations.
That said, haven't run into any problems as is with the patch.
2023-12-20 Jakub Jelinek <jakub@redhat.com>
PR target/112962
* config/i386/i386-builtins.cc (ix86_builtins): Increase by one
element.
(def_builtin): If not -fnon-call-exceptions, set TREE_NOTHROW on
the builtin FUNCTION_DECL. Add leaf attribute to DECL_ATTRIBUTES.
(ix86_add_new_builtins): Likewise.
The following patch fixes 2 issues in handling of casts for mergeable
stmts.
The first hunk fixes the case when we have two nested casts (typically
after optimization that is zero-extension of a sign-extension because
everything else should have been folded into a single cast). If
the lowering of the outer cast needs to make the code conditional
(e.g.
for (...)
{
if (idx <= 32)
{
if (idx < 32)
{ ... handle_operand (idx); ... }
else
{ ... handle_operand (32); ... }
}
...
}
) and the lowering of the inner one as well, right now it creates invalid
SSA form, because even for the inner cast we need a PHI on the loop
and the PHI argument from the latch edge is a SSA_NAME initialized in
the conditionally executed bb. The hunk fixes that by detecting such
a case and adding further PHI nodes at the end of the ifs such that
the right value propagates to the next loop iteration. We can use
0 arguments for the other edges because the inner operand handling
is only done for the first set of iterations and then the other ifs take
over.
The rest fixes a case of again invalid SSA form, when for a sign extension
we need to use the 0 or -1 value initialized by earlier iteration in
a constant idx case, the code was using the value of the loop PHI argument
from latch edge rather than result; that is correct for cases expanded
in straight line code after the loop, but not inside of the loop for the
cases of handle_cast conditionals, there we should use PHI result. This
is done in the second hunk and supported by the remaining hunks, where
it clears m_bb to tell the code we aren't in the loop anymore.
Note, this patch doesn't deal with similar problems during multiplication,
division, floating casts etc. where we just emit a library call. I'll
need to make sure in that case we don't merge more than one cast per
operand.
2023-12-20 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/112941
* gimple-lower-bitint.cc (bitint_large_huge::handle_cast): If
save_cast_conditional, instead of adding assignment of t4 to
m_data[save_data_cnt + 1] before m_gsi, add phi nodes such that
t4 propagates to m_bb loop. For constant idx, use
m_data[save_data_cnt] rather than m_data[save_data_cnt + 1] if inside
of the m_bb loop.
(bitint_large_huge::lower_mergeable_stmt): Clear m_bb when no longer
expanding inside of that loop.
(bitint_large_huge::lower_comparison_stmt): Likewise.
(bitint_large_huge::lower_addsub_overflow): Likewise.
(bitint_large_huge::lower_mul_overflow): Likewise.
(bitint_large_huge::lower_bit_query): Likewise.
* gcc.dg/bitint-55.c: New test.
The following patch changes -Walloc-size warning to no longer warn
about int *p = calloc (1, sizeof (int));, because as discussed earlier,
the size is IMNSHO sufficient in that case, for alloc_size with 2
arguments warns if the product of the 2 arguments is insufficiently small.
Also, it warns also for explicit casts of malloc/calloc etc. calls
rather than just implicit, so not just
int *p = malloc (1);
but also
int *p = (int *) malloc (1);
It also fixes some ICEs where the code didn't verify the alloc_size
arguments properly (Walloc-size-5.c testcase ICEs with vanilla trunk).
And lastly, it introduces a coding style warning, -Wcalloc-transposed-args
to warn for calloc (sizeof (struct S), 1) and similar calls (regardless
of what they are cast to, warning whenever first argument is sizeof and
the second is not).
2023-12-20 Jakub Jelinek <jakub@redhat.com>
gcc/
* doc/invoke.texi (-Walloc-size): Add to the list of
warning options, remove unnecessary line-break.
(-Wcalloc-transposed-args): Document new warning.
gcc/c-family/
* c.opt (Wcalloc-transposed-args): New warning.
* c-common.h (warn_for_calloc, warn_for_alloc_size): Declare.
* c-warn.cc (warn_for_calloc, warn_for_alloc_size): New functions.
gcc/c/
* c-parser.cc (c_parser_postfix_expression_after_primary): Grow
sizeof_arg and sizeof_arg_loc arrays to 6 elements. Call
warn_for_calloc if warn_calloc_transposed_args for functions with
alloc_size type attribute with 2 arguments.
(c_parser_expr_list): Use 6 instead of 3.
* c-typeck.cc (build_c_cast): Call warn_for_alloc_size for casts
of calls to functions with alloc_size type attribute.
(convert_for_assignment): Likewise.
gcc/testsuite/
* gcc.dg/Walloc-size-4.c: New test.
* gcc.dg/Walloc-size-5.c: New test.
* gcc.dg/Wcalloc-transposed-args-1.c: New test.
We were missing validation of the candidate register operands in the
ldp/stp pass. I was relying on recog rejecting such cases when we
formed the final pair insn, but the testcase shows that with
-fharden-conditionals we attempt to combine two insns with asm_operands,
both containing mem rtxes. This then trips the assert:
gcc_assert (change->new_uses.is_valid ());
in the stp case as we aren't expecting to have (distinct) uses of mem in
the candidate stores.
While doing this I noticed that it seems more natural to have the
initial definition of mem_size closer to its first use in track_access,
so I moved that down.
gcc/ChangeLog:
PR target/113062
* config/aarch64/aarch64-ldp-fusion.cc
(ldp_bb_info::track_access): Punt on accesses with invalid
register operands, move definition of mem_size closer to its
first use.
gcc/testsuite/ChangeLog:
PR target/113062
* gcc.dg/pr113062.c: New test.
This patch would like to fix the below execution failure when build with
"-march=rv64gcv_zvl512b -mabi=lp64d -mcmodel=medlow --param=riscv-autovec-lmul=m8 -ftree-vectorize -fno-vect-cost-model -O3"
FAIL: gcc.dg/vect/pr92420.c -flto -ffat-lto-objects execution test
The will be one single step const vector like { -4, 4, -3, 5, -2, 6, -1, 7, ...}.
For such const vector generation with single step, we will generate vid
+ diff here. For example as below, given npatterns = 4.
v1= {3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8... }
v2 (diff) = {3 - 0, 2 - 1, 1 - 2, 0 - 3, 7 - 4, 6 - 5, 5 - 6, 4 - 7...}
= {3, 1, -1, 3, 3, 1, -1, 3 ...}
v1 = vd + vid.
Unfortunately, that cannot work well for { -4, 4, -3, 5, -2, 6, -1, 7, ...}
because it has one implicit requirement for the diff. Aka, the diff
sequence in npattern are repeated. For example the v2 (diff) as above.
The diff between { -4, 4, -3, 5, -2, 6, -1, 7, ...} and vid are not
npattern size repeated and then we have wrong code here. We implement
one new code gen the sequence like { -4, 4, -3, 5, -2, 6, -1, 7, ...}.
The below tests are passed for this patch.
* The RV64 regression test with rv64gcv configuration.
* The run test gcc.dg/vect/pr92420.c for below configurations.
riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow
riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m1
riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m1/--param=riscv-autovec-preference=fixed-vlmax
riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m2
riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m2/--param=riscv-autovec-preference=fixed-vlmax
riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m4
riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m4/--param=riscv-autovec-preference=fixed-vlmax
riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8
riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8/--param=riscv-autovec-preference=fixed-vlmax
riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m1
riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m1/--param=riscv-autovec-preference=fixed-vlmax
riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m2
riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m2/--param=riscv-autovec-preference=fixed-vlmax
riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m4
riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m4/--param=riscv-autovec-preference=fixed-vlmax
riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8
riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8/--param=riscv-autovec-preference=fixed-vlmax
riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m1
riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m1/--param=riscv-autovec-preference=fixed-vlmax
riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m2
riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m2/--param=riscv-autovec-preference=fixed-vlmax
riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m4
riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m4/--param=riscv-autovec-preference=fixed-vlmax
riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8
riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8/--param=riscv-autovec-preference=fixed-vlmax
gcc/ChangeLog:
* config/riscv/riscv-v.cc (rvv_builder::npatterns_vid_diff_repeated_p):
New function to predicate the diff to vid is repeated or not.
(expand_const_vector): Add restriction
for the vid-diff code gen and implement general one.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/bug-7.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
The stack pointer is biased by 2047 bytes on sparc64, so the range it
delimits is way off. Unbias the addresses returned by
__builtin_stack_address (), so that the strub builtins, inlined or
not, can function correctly. I've considered introducing a new target
macro, but using STACK_POINTER_OFFSET seems safe, and it enables the
register save areas to be scrubbed as well.
Because of the large fixed-size outgoing args area next to the
register save area on sparc, we still need __strub_leave to not
allocate its own frame, otherwise it won't be able to clear part of
the frame it should.
for gcc/ChangeLog
PR middle-end/112917
* builtins.cc (expand_bultin_stack_address): Add
STACK_POINTER_OFFSET.
* doc/extend.texi (__builtin_stack_address): Adjust.
If we allow __strub_leave to allocate a frame on sparc, it will
overlap with a lot of the stack range we're supposed to scrub, because
of the large fixed-size outgoing args and register save area.
Unfortunately, setting up the PIC register seems to prevent the frame
pointer from being omitted.
Since the strub runtime doesn't issue calls or use global variables,
at least on sparc, disabling PIC to compile strub.c seems to do the
right thing.
for libgcc/ChangeLog
PR middle-end/112917
* config.host (sparc, sparc64): Enable...
* config/sparc/t-sparc: ... this new fragment.
Builtin expanders for memset and memcpy may involve conditionals and
loops, but their sequences may be end up emitted in edges. Alas,
commit_one_edge_insertion rejects sequences that end with a jump, a
requirement that makes sense for insertions after expand, but not so
much during expand.
During expand, jumps may appear in the middle of the insert sequence
as much as in the end, and it's only after committing edge insertions
out of PHI nodes that we go through the entire function splitting
blocks where needed, so relax the assert in commit_one_edge_insertion
so that jumps are accepted during expand even at the end of the
sequence.
for gcc/ChangeLog
PR rtl-optimization/113002
* cfgrtl.cc (commit_one_edge_insertion): Tolerate jumps in the
inserted sequence during expand.
for gcc/testsuite/ChangeLog
PR rtl-optimization/113002
* gcc.dg/vect/pr113002.c: New.
Instead of get and set macros to apply a delta, use a single macro
that resorts to a temporary wrapper class to apply it.
for gcc/ChangeLog
* builtins.cc (delta_type): New template class.
(set_apply_args_size, get_apply_args_size): Replace with...
(saved_apply_args_size): ... this.
(set_apply_result_size, get_apply_result_size): Replace with...
(saved_apply_result_size): ... this.
(apply_args_size, apply_result_size): Adjust.
The GCC manual has a whole section on signedness of bitfields with the ultimate
conclusion that the property really isn't an ABI issue, but instead a C dialect
issue (agreed). Furthermore it concludes that all targets should behave the
same by default.
So it was a mistake for the mcore port to force bitfields to be unsigned and
that never should have been included. This patch rectifies that problem.
I should have remembered this -- I went down this path once in the 90s. I
don't recall which port anymore, but once Joseph mentioned this policy bits and
pieces did start to come back to me.
Restoring the proper default happens to also fix 170 tests in the GCC
testsuite, some of which would go into infinite loops when bitfields were
treated as signed values (pr88621 for example). Essentially the testing time
cuts in half, which was actually the point of digging into pr88621 to begin
with.
gcc/
* config/mcore/mcore.h (CC1_SPEC): Do not set -funsigned-bitfields.
I added some -finline-stringops tests that included memcmp-1.c, but
carried over the timeout factor onto only one such test. Jeff Law
kindly pointed that out (thanks!), so here's the fix.
for gcc/testsuite/ChangeLog
* gcc.dg/torture/inline-mem-cmp-1.c: Copy timeout factor from
mem-cmp-1.c.
* gcc.dg/torture/inline-mem-cpy-1.c: Likewise.
It is always safe to set the computed bit for dynamic object sizes at
the end of collect_object_sizes_for because even in case of a dependency
loop encountered in nested calls, we have an SSA temporary to actually
finish the object size expression. The reexamine pass for dynamic
object sizes is only for propagation of unknowns and gimplification of
the size expressions, not for loop resolution as in the case of static
object sizes.
gcc/ChangeLog:
PR tree-optimization/113012
* tree-object-size.cc (compute_builtin_object_size): Expand
comment for dynamic object sizes.
(collect_object_sizes_for): Always set COMPUTED bitmap for
dynamic object sizes.
gcc/testsuite/ChangeLog:
PR tree-optimization/113012
* gcc.dg/ubsan/pr113012.c: New test case.
Signed-off-by: Siddhesh Poyarekar <siddhesh@gotplt.org>
Instead of global optimization levels and flags, check per-function
ones.
for gcc/ChangeLog
* ipa-strub.cc (gsi_insert_finally_seq_after_call): Likewise.
(pass_ipa_strub::adjust_at_calls_call): Likewise.
The strub builtins are not suited for cross-unit inlining, they should
only be inlined by the builtin expanders, if at all. While testing on
sparc64, it occurred to me that, if libgcc was built with LTO enabled,
lto1 might inline them, and that would likely break things. So, make
sure they're clearly marked as not inlinable.
for libgcc/ChangeLog
* strub.c (ATTRIBUTE_NOINLINE): New.
(ATTRIBUTE_STRUB_CALLABLE): Add it.
(__strub_dummy_force_no_leaf): Drop it.
sol2.h may define LINK_PIE_SPEC and leave LD_PIE_SPEC undefined, but
gcc.cc will only provide a LD_PIE_SPEC definition if LINK_PIE_SPEC is
not defined, and thenit uses LD_PIE_SPEC guarded by #ifdef HAVE_LD_PIE
only. Add LD_PIE_SPEC to the guard.
gcc/ChangeLog
* gcc.cc (process_command): Use LD_PIE_SPEC only if defined.
Here we first use and therefore synthesize the local class operator<=>
from an unevaluated context, which inadvertently affects synthesization
by preventing functions used within the definition (such as the copy
constructor of std::strong_ordering) from getting marked as odr-used.
This patch fixes this by using maybe_push_to_top_level in synthesize_method
which ensures cp_unevaluated_operand gets cleared even in the function-local
case.
PR c++/113063
gcc/cp/ChangeLog:
* method.cc (synthesize_method): Use maybe_push_to_top_level
and maybe_pop_from_top_level.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/spaceship-synth16.C: New test.
In the function-local case of maybe_pop_from_top_level, we need to
restore the global flags that maybe_push_to_top_level cleared.
gcc/cp/ChangeLog:
* name-lookup.cc (struct local_state_t): Define.
(local_state_stack): Define.
(maybe_push_to_top_level): Use them.
(maybe_pop_from_top_level): Likewise.
* pt.cc (instantiate_decl): Remove dead code for saving/restoring
cp_unevaluated_operand and c_inhibit_evaluation_warnings.
Calling a non-static member function on a null pointer is undefined
behaviour (see [expr.ref] p8) and should error in constant evaluation,
even if the 'this' pointer is never actually accessed within that
function.
One catch is that currently, the function pointer conversion operator
for lambdas passes a null pointer as the 'this' pointer to the
underlying 'operator()', so for now we ignore such calls.
PR c++/102420
gcc/cp/ChangeLog:
* constexpr.cc (cxx_bind_parameters_in_call): Check for calling
non-static member functions with a null pointer.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/constexpr-memfn2.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
The linking of libgcc is already present in %(liborig), so the current
situation duplicates libraries. This was not an issue until macOS's new
linker started giving warnings for such cases.
libgfortran/ChangeLog:
PR libfortran/110651
* libgfortran.spec.in: Remove duplicate libraries.
This patch adds "hpe" to the known properties for the "vendor" selector,
and support for "acquire" and "release" for "atomic_default_mem_order".
gcc/ChangeLog
* omp-general.cc (vendor_properties): Add "hpe".
(atomic_default_mem_order_properties): Add "acquire" and "release".
(omp_context_selector_matches): Handle "acquire" and "release".
gcc/testsuite/ChangeLog
* c-c++-common/gomp/declare-variant-2.c: Don't expect error on
"acquire" and "release".
* gfortran.dg/gomp/declare-variant-2a.f90: Likewise.
This patch introduces enumerators to represent trait-set names and
trait names, which makes it easier to use tables to control other
behavior and for switch statements to dispatch on the tags. The tags
are stored in the same place in the TREE_LIST structure (OMP_TSS_ID or
OMP_TS_ID) and are encoded there as integer constants.
gcc/ChangeLog
* omp-selectors.h: New file.
* omp-general.h: Include omp-selectors.h.
(OMP_TSS_CODE, OMP_TSS_NAME): New.
(OMP_TS_CODE, OMP_TS_NAME): New.
(make_trait_set_selector, make_trait_selector): Adjust declarations.
(omp_construct_traits_to_codes): Likewise.
(omp_context_selector_set_compare): Likewise.
(omp_get_context_selector): Likewise.
(omp_get_context_selector_list): New.
* omp-general.cc (omp_construct_traits_to_codes): Pass length in
as argument instead of returning it. Make it table-driven.
(omp_tss_map): New.
(kind_properties, vendor_properties, extension_properties): New.
(atomic_default_mem_order_properties): New.
(omp_ts_map): New.
(omp_check_context_selector): Simplify lookup and dispatch logic.
(omp_mark_declare_variant): Ignore variants with unknown construct
selectors. Adjust for new representation.
(make_trait_set_selector, make_trait_selector): Adjust for new
representations.
(omp_context_selector_matches): Simplify dispatch logic. Avoid
fixed-sized buffers and adjust call to omp_construct_traits_to_codes.
(omp_context_selector_props_compare): Adjust for new representations
and simplify dispatch logic.
(omp_context_selector_set_compare): Likewise.
(omp_context_selector_compare): Likewise.
(omp_get_context_selector): Adjust for new representations, and split
out...
(omp_get_context_selector_list): New function.
(omp_lookup_tss_code): New.
(omp_lookup_ts_code): New.
(omp_context_compute_score): Adjust for new representations. Avoid
fixed-sized buffers and magic numbers. Adjust call to
omp_construct_traits_to_codes.
* gimplify.cc (omp_construct_selector_matches): Avoid use of
fixed-size buffer. Adjust call to omp_construct_traits_to_codes.
gcc/c/ChangeLog
* c-parser.cc (omp_construct_selectors): Delete.
(omp_device_selectors): Delete.
(omp_implementation_selectors): Delete.
(omp_user_selectors): Delete.
(c_parser_omp_context_selector): Adjust for new representations
and simplify dispatch logic. Uniformly warn instead of sometimes
error when an unknown selector is found. Adjust error messages
for extraneous/incorrect score.
(c_parser_omp_context_selector_specification): Likewise.
(c_finish_omp_declare_variant): Adjust for new representations.
gcc/cp/ChangeLog
* decl.cc (omp_declare_variant_finalize_one): Adjust for new
representations.
* parser.cc (omp_construct_selectors): Delete.
(omp_device_selectors): Delete.
(omp_implementation_selectors): Delete.
(omp_user_selectors): Delete.
(cp_parser_omp_context_selector): Adjust for new representations
and simplify dispatch logic. Uniformly warn instead of sometimes
error when an unknown selector is found. Adjust error messages
for extraneous/incorrect score.
(cp_parser_omp_context_selector_specification): Likewise.
* pt.cc (tsubst_attribute): Adjust for new representations.
gcc/fortran/ChangeLog
* gfortran.h: Include omp-selectors.h.
(enum gfc_omp_trait_property_kind): Delete, and replace all
references with equivalent omp_tp_type enumerators.
(struct gfc_omp_trait_property): Update for omp_tp_type.
(struct gfc_omp_selector): Replace string name with new enumerator.
(struct gfc_omp_set_selector): Likewise.
* openmp.cc (gfc_free_omp_trait_property_list): Update for
omp_tp_type.
(omp_construct_selectors): Delete.
(omp_device_selectors): Delete.
(omp_implementation_selectors): Delete.
(omp_user_selectors): Delete.
(gfc_ignore_trait_property_extension): New.
(gfc_ignore_trait_property_extension_list): New.
(gfc_match_omp_selector): Adjust for new representations and simplify
dispatch logic. Uniformly warn instead of sometimes error when an
unknown selector is found.
(gfc_match_omp_context_selector): Adjust for new representations.
Adjust error messages for extraneous/incorrect score.
(gfc_match_omp_context_selector_specification): Likewise.
* trans-openmp.cc (gfc_trans_omp_declare_variant): Adjust for
new representations.
gcc/testsuite/
* c-c++-common/gomp/declare-variant-1.c: Expect warning on
unknown selectors.
* c-c++-common/gomp/declare-variant-2.c: Likewise. Also adjust
messages for score errors.
* c-c++-common/gomp/declare-variant-no-score.c: New.
* gfortran.dg/gomp/declare-variant-1.f90: Expect warning on
unknown selectors.
* gfortran.dg/gomp/declare-variant-2.f90: Likewise. Also adjust
messages for score errors.
* gfortran.dg/gomp/declare-variant-no-score.f90: New.
Previously, name-list properties specified as identifiers were stored
in the TREE_PURPOSE/OMP_TP_NAME slot, while those specified as strings
were stored in the TREE_VALUE/OMP_TP_VALUE slot. This patch puts both
representations in OMP_TP_VALUE with a magic cookie in OMP_TP_NAME.
gcc/ChangeLog
* omp-general.h (OMP_TP_NAMELIST_NODE): New.
* omp-general.cc (omp_context_name_list_prop): Move earlier
in the file, and adjust for new representation.
(omp_check_context_selector): Adjust this too.
(omp_context_selector_props_compare): Likewise.
gcc/c/ChangeLog
* c-parser.cc (c_parser_omp_context_selector): Adjust for new
namelist property representation.
gcc/cp/ChangeLog
* parser.cc (cp_parser_omp_context_selector): Adjust for new
namelist property representation.
* pt.cc (tsubst_attribute): Likewise.
gcc/fortran/ChangeLog
* trans-openmp.cc (gfc_trans_omp_declare_varaint): Adjust for
new namelist property representation.