During investigate the support of early break autovec, we notice
the test full-vec-move1.c will be optimized to 'return 0;' in main
function body. Because somehow the value of V type is compiler
time constant, and then the second loop will be considered as
assert (true).
Thus, the ccp4 pass will eliminate these stmt and just return 0.
typedef int16_t V __attribute__((vector_size (128)));
int main ()
{
V v;
for (int i = 0; i < sizeof (v) / sizeof (v[0]); i++)
(v)[i] = i;
V res = v;
for (int i = 0; i < sizeof (v) / sizeof (v[0]); i++)
assert (res[i] == i); // will be optimized to assert (true)
}
This patch would like to introduce a extern function to use the res[i]
that get rid of the ccp4 optimization.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls-vlmax/full-vec-move1.c:
Introduce extern func use to get rid of ccp4 optimization.
Signed-off-by: Pan Li <pan2.li@intel.com>
The test FAILs on i686-linux due to
.../gcc/testsuite/g++.dg/torture/vector-subaccess-1.C:16:6: warning: SSE vector argument without SSE enabled changes the ABI [-Wpsabi]
excess warnings.
This fixes it by adding -Wno-psabi, like commonly done in other tests.
2024-05-09 Jakub Jelinek <jakub@redhat.com>
PR c++/89224
* g++.dg/torture/vector-subaccess-1.C: Add -Wno-psabi as additional
options.
cpymemsi expansion was available for RISC-V since the initial port.
However, there are not tests to detect regression.
This patch adds such tests.
Three of the tests target the expansion requirements (known length and
alignment). One test reuses an existing memcpy test from the by-pieces
framework (gcc/testsuite/gcc.dg/torture/inline-mem-cpy-1.c).
gcc/testsuite/ChangeLog:
* gcc.target/riscv/cpymemsi-1.c: New test.
* gcc.target/riscv/cpymemsi-2.c: New test.
* gcc.target/riscv/cpymemsi-3.c: New test.
* gcc.target/riscv/cpymemsi.c: New test.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
After reading the ICE for the PR, it's obvious the error message is
rather cryptic. This makes it less so.
gcc/ChangeLog:
* range-op.cc (range_op_handler::discriminator_fail): Reword error
message.
Originally eliminate_regs_in_insnit will transform
(parallel [
(set (reg:QI 130)
(plus:QI (subreg:QI (reg:DI 19 frame) 0)
(const_int 96)))
(clobber (reg:CC 17 flag))]) {*addqi_1}
to
(set (reg:QI 130)
(subreg:QI (reg:DI 19 frame) 0)) {*movqi_internal}
when verify_changes.
But with No Flags add, it transforms
(set (reg:QI 5 di)
(plus:QI (subreg:QI (reg:DI 19 frame) 0)
(const_int 96))) {*addqi_1_nf}
to
(set (reg:QI 5 di)
(subreg:QI (reg:DI 19 frame) 0)) {*addqi_1_nf}.
there is no extra clobbers at the end, and
its dest reg just is a hardreg. For ix86_hardreg_mov_ok,
it returns false. So it fails to update insn and causes
the ICE when transform to movqi_internal.
But actually it is ok and safe for ix86_hardreg_mov_ok
when lra_in_progress.
And tested the spec2017, the performance was not affected.
gcc/ChangeLog:
* config/i386/i386.cc (ix86_hardreg_mov_ok): Relax
hard reg mov restriction when lra in progress.
Reposting without the patch that ignores whitespace. The CI system doesn't
like including both patches, that'll generate a failure to apply and none of
the tests actually get run.
So I managed to goof the if-then-else level of the bseti bits last week. They
were supposed to be a last ditch effort to improve the result, but ended up
inside a conditional where they don't really belong. I almost always use Zba,
Zbb and Zbs together, so it slipped by.
So it's NFC if you always test with Zbb and Zbs enabled together. But if you
enabled Zbs without Zbb you'd see a failure to use bseti.
gcc/
* config/riscv/riscv.cc (riscv_build_integer_1): Fix incorrect
if-then-else nesting of Zbs code.
This supports __powidf2 by means of a double wrapper for already
existing f7_powi (renamed to __f7_powi by f7-renames.h).
It tweaks the implementation so that it does not perform trivial
multiplications with 1.0 any more, but instead uses a move.
It also fixes the last statement of f7_powi, which was wrong.
Notice that f7_powi was unused until now.
PR target/114981
libgcc/config/avr/libf7/
* libf7-common.mk (F7_ASM_PARTS): Add D_powi
* libf7-asm.sx (F7MOD_D_powi_, __powidf2): New module and function.
* libf7.c (f7_powi): Fix last (wrong) statement.
Tweak trivial multiplications with 1.0.
gcc/testsuite/
* gcc.target/avr/pr114981-powil.c: New test.
PR114810 was fixed in machine-dependent way. This patch is a fix of
the PR on LRA side. LRA chose alternative with constraints `&r,r,ro`
on i686 when all operands of DImode and there are only 6 available
general regs. The patch recognizes such case and significantly
increase the alternative cost. It does not reject alternative
completely. So the fix is safe but it might not work for all
potentially possible cases of registers lack as register classes can
have any relations including subsets and intersections.
gcc/ChangeLog:
PR target/114810
* lra-constraints.cc (process_alt_operands): Calculate union reg
class for the alternative, peak matched regs and required reload
regs. Recognize alternatives with lack of available registers and
make them costly. Add debug print about this case.
The PR complains that
void do_something(){
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wunused-label"
start:;
#pragma GCC diagnostic pop
} #1
doesn't work. That's because we warn_for_unused_label only while we're
in finish_function, meaning we're at #1 where we're outside the #pragma
region. We can use suppress_warning + warning_suppressed_p to fix this.
Note that I'm not using TREE_USED. Propagating it in tsubst_stmt/LABEL_EXPR
from decl to label would mean that we don't warn in do_something2, but
I think we want the warning there: we're in a template and the goto is
a discarded statement.
PR c++/113582
gcc/c-family/ChangeLog:
* c-warn.cc (warn_for_unused_label): Don't warn if -Wunused-label has
been suppressed for the label.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_label_for_labeled_statement): suppress_warning
if it's not enabled at input_location.
* pt.cc (tsubst_stmt): Call copy_warning.
gcc/testsuite/ChangeLog:
* g++.dg/warn/Wunused-label-4.C: New test.
We can optimize `a == nonnegative ? a : ABS<a>`, `a > nonnegative ? a : ABS<a>`
and `a >= nonnegative ? a : ABS<a>` into `ABS<a>`. This allows removal of
some extra comparison and extra conditional moves in some cases.
I don't remember where I had found though but it is simple to add so
let's add it.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
Note I have a secondary pattern for the equal case as either a or nonnegative
could be used.
PR tree-optimization/112392
gcc/ChangeLog:
* match.pd (`x CMP nonnegative ? x : ABS<x>`): New pattern;
where CMP is ==, > and >=.
(`x CMP nonnegative@y ? y : ABS<x>`): New pattern.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/phi-opt-41.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Currently, code sinking will sink code at the use points with loop having same
nesting depth. The following patch improves code sinking by placing the sunk
code in begining of the block after the labels.
2024-05-08 Ajit Kumar Agarwal <aagarwa1@linux.ibm.com>
gcc/ChangeLog:
PR tree-optimization/81953
* tree-ssa-sink.cc (statement_sink_location):Sink statements at
the begining of the basic block after labels.
gcc/testsuite/ChangeLog:
PR tree-optimization/81953
* gcc.dg/tree-ssa/ssa-sink-21.c: New test.
The lshr<GPR:mode>3_zero_extend_4 pattern targets bit extraction
with zero-extension. This pattern represents the canonical form
of zero-extensions of a logical right shift.
The same optimization can be applied to sign-extensions.
Given the two optimizations are so similar, this patch converts
the existing one to also cover the sign-extension case as well.
gcc/ChangeLog:
* config/riscv/iterators.md (ashiftrt): New code attribute
'extract_shift' and adding extractions to optab.
* config/riscv/riscv.md (*lshr<GPR:mode>3_zero_extend_4): Rename to...
(*<any_extract:optab><GPR:mode>3):...this and add support for
sign-extensions.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/extend-shift-helpers.h: Add helpers for
sign-extension.
* gcc.target/riscv/sign-extend-rshift-32.c: New test.
* gcc.target/riscv/sign-extend-rshift-64.c: New test.
* gcc.target/riscv/sign-extend-rshift.c: New test.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
The combiner attempts to optimize a zero-extension of a logical right shift
using zero_extract. We already utilize this optimization for those cases
that result in a single instructions. Let's add a insn_and_split
pattern that also matches the generic case, where we can emit an
optimized sequence of a slli/srli.
Tested with SPEC CPU 2017 (rv64gc).
PR target/111501
gcc/ChangeLog:
* config/riscv/riscv.md (*lshr<GPR:mode>3_zero_extend_4): New
pattern for zero-extraction.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/extend-shift-helpers.h: New test.
* gcc.target/riscv/pr111501.c: New test.
* gcc.target/riscv/zero-extend-rshift-32.c: New test.
* gcc.target/riscv/zero-extend-rshift-64.c: New test.
* gcc.target/riscv/zero-extend-rshift.c: New test.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
The pattern lshrsi3_zero_extend_2 extracts the MSB bits of the lower
32-bit word and zero-extends it back to DImode.
This is realized using srliw, which operates on 32-bit registers.
The same optimziation can be applied to sign-extensions when emitting
a sraiw instead of the srliw.
Given these two optimizations are so similar, this patch simply
converts the existing one to also cover the sign-extension case as well.
gcc/ChangeLog:
* config/riscv/iterators.md (sraiw): New code iterator 'any_extract'.
New code attribute 'extract_sidi_shift'.
* config/riscv/riscv.md (*lshrsi3_zero_extend_2): Rename to...
(*lshrsi3_extend_2):...this and add support for sign-extensions.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sign-extend-1.c: Test sraiw 24 and sraiw 16.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
We already optimize a sign-extension of a right-shift by 31 in
<optab>si3_extend. Let's add a test for that (similar to
zero-extend-1.c).
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sign-extend-1.c: New test.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
For pointer reductions we need to convert the initial value to
the vector component integer type.
* tree-vect-loop.cc (get_initial_defs_for_reduction): Convert
initial value to the vector component type.
When we have a non-grouped access we bogously multiply by zero.
This shows most with single-lane SLP but also happens with
the multi-lane splat case.
* tree-vect-data-refs.cc (vect_enhance_data_refs_alignment):
Properly guard DR_GROUP_SIZE access with STMT_VINFO_GROUPED_ACCESS.
This fixes a typo in combine_reg_notes in the load/store pair fusion
pass. As it stands, the calls to filter_notes store any
REG_FRAME_RELATED_EXPR to fr_expr with the following association:
- i2 -> fr_expr[0]
- i1 -> fr_expr[1]
but then the checks inside the following if statement expect the
opposite (more natural) association, i.e.:
- i2 -> fr_expr[1]
- i1 -> fr_expr[0]
this patch fixes the oversight by swapping the fr_expr indices in the
calls to filter_notes.
In hindsight it would probably have been less confusing / error-prone to
have combine_reg_notes take an array of two insns, then we wouldn't have
to mix 1-based and 0-based indexing as well as remembering to call
filter_notes in reverse program order. This however is a minimal fix
for backporting purposes.
gcc/ChangeLog:
PR target/114936
* config/aarch64/aarch64-ldp-fusion.cc (combine_reg_notes):
Ensure insn iN has its REG_FRAME_RELATED_EXPR (if any) stored in
FR_EXPR[N-1], thus matching the correspondence expected by the
copy_rtx calls.
This fixes a couple of tests (gcc.dg/vect/pr109011-*.c) on s390 where
loops are unrolled although -fno-unroll-loops is specified.
gcc/ChangeLog:
* tree-ssa-loop-prefetch.cc (determine_unroll_factor): Honour
-fno-unroll-loops.
When insert_updated_phi_nodes_for tries to skip pruning the IDF to
blocks dominated by the nearest common dominator of the set of
definition blocks it compares against ENTRY_BLOCK but that's never
going to be the common dominator. In fact if it ever were the code
fails to copy IDF to PRUNED_IDF, leading to wrong code.
The following fixes that by avoiding the copy and pruning from the
IDF in-place as well as using the more approprate check against
the single successor of the ENTRY_BLOCK.
* tree-into-ssa.cc (insert_updated_phi_nodes_for): Skip
pruning when the nearest common dominator is the successor
of ENTRY_BLOCK. Do not copy IDF but prune it directly.
The optimize_range_tests_to_bit_test optimization normally emits a range
test first:
if (entry_test_needed)
{
tem = build_range_check (loc, optype, unshare_expr (exp),
false, lowi, high);
if (tem == NULL_TREE || is_gimple_val (tem))
continue;
}
so during the bit test we already know that exp is in the [lowi, high]
range, but skips it if we have range info which tells us this isn't
necessary.
Also, normally it emits shifts by exp - lowi counter, but has an
optimization to use just exp counter if the mask isn't a more expensive
constant in that case and lowi is > 0 and high is smaller than prec.
The following testcase is miscompiled because the two abnormal cases
are triggered. The range of exp is [43, 43][48, 48][95, 95], so we on
64-bit arch decide we don't need the entry test, because 95 - 43 < 64.
And we also decide to use just exp as counter, because the range test
tests just for exp == 43 || exp == 48, so high is smaller than 64 too.
Because 95 is in the exp range, we can't do that, we'd either need to
do a range test first, i.e.
if (exp - 43U <= 48U - 43U) if ((1UL << exp) & mask1))
or need to subtract lowi from the shift counter, i.e.
if ((1UL << (exp - 43)) & mask2)
but can't do both unless r.upper_bound () is < prec.
The following patch ensures that.
2024-05-08 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/114965
* tree-ssa-reassoc.cc (optimize_range_tests_to_bit_test): Don't try to
optimize away exp - lowi subtraction from shift count unless entry
test is emitted or unless r.upper_bound () is smaller than prec.
* gcc.c-torture/execute/pr114965.c: New test.
This removes the last parameter of choose_multiplier, which is unused, adds
another assertion and more details to the description and various comments.
Likewise to the closely related invert_mod2n, except for the last parameter.
[changelog]
* expmed.h (choose_multiplier): Tweak description and remove last
parameter.
* expmed.cc (choose_multiplier): Likewise. Add assertion for the
third parameter and adds details to various comments.
(invert_mod2n): Tweak description and add assertion for the first
parameter.
(expand_divmod): Adjust calls to choose_multiplier.
* tree-vect-generic.cc (expand_vector_divmod): Likewise.
* tree-vect-patterns.cc (vect_recog_divmod_pattern): Likewise.
(if_then_else:SI (eq (reg:CCZ 17 flags)
(const_int 0 [0]))
(reg/v:SI 101 [ e ])
(reg:SI 102))
The cost is 8 for the rtx, the cost for
(eq (reg:CCZ 17 flags) (const_int 0 [0])) is 4,
but this is just an operator do not need to compute it's cost in cmov.
gcc/ChangeLog:
PR target/109549
* config/i386/i386.cc (ix86_rtx_costs): The XEXP (x, 0) for cmov
is an operator do not need to compute cost.
gcc/testsuite/ChangeLog:
* gcc.target/i386/cmov6.c: Fixed.
This throws the switch on prange. After this patch, it is no longer
valid to store a pointer in an irange (or vice versa). Instead, they
must go in prange, which is faster and more memory efficient.
I will push this now, so I have time to do any follow-up bugfixing
before going on paternity leave.
There are various cleanups we plan on doing after this patch (faster
intersect/union, remove range-op-mixed.h, remove value_range in favor
of int_range_max, reclaim the name for the Value_Range temporary,
clean up range-ops, etc etc). But we will hold off on those for now
to make it easier to revert this patch, if for some reason we need to
do so while I'm away.
Tested on x86-64 Linux.
gcc/ChangeLog:
* gimple-range-cache.cc (sbr_sparse_bitmap::sbr_sparse_bitmap):
Change irange to prange.
* gimple-range-fold.cc (fold_using_range::fold_stmt): Same.
(fold_using_range::range_of_address): Same.
* gimple-range-fold.h (range_of_address): Same.
* gimple-range-infer.cc (gimple_infer_range::add_nonzero): Same.
* gimple-range-op.cc (class cfn_strlen): Same.
* gimple-range-path.cc
(path_range_query::adjust_for_non_null_uses): Same.
* gimple-ssa-warn-access.cc (pass_waccess::check_pointer_uses): Same.
* tree-ssa-structalias.cc (find_what_p_points_to): Same.
* range-op-ptr.cc (range_op_table::initialize_pointer_ops): Remove
hybrid entries in table.
* range-op.cc (range_op_table::range_op_table): Add pointer
entries for bitwise and/or and min/max.
* value-range.cc (irange::verify_range): Add assert.
* value-range.h (irange::varying_compatible_p): Remove check for
error_mark_node.
(irange::supports_p): Remove pointer support.
* ipa-cp.h (ipa_supports_p): Add prange support.
In r14-9266-g2823b4d96d9ec4 I gave all temporary vars a DECL_CONTEXT,
including those at namespace or global scope, so that they could be
properly merged across importers. However, not all of these temporary
vars are actually supposed to be mergeable.
For instance, in the attached testcase we have an unnamed temporary var
used in the NSDMI of a class member, which cannot properly merged -- but
it also doesn't need to be, as it'll be thrown away when the class type
itself is merged anyway.
This patch reverts the change made above and instead makes a weaker
adjustment that only causes temporary vars with linkage have a
DECL_CONTEXT to merge from. This way these unnamed, "unmergeable"
temporaries are properly streamed by value again.
PR c++/114856
gcc/cp/ChangeLog:
* call.cc (make_temporary_var_for_ref_to_temp): Set context for
temporaries with linkage.
* init.cc (create_temporary_var): Revert to only set context
when in a function decl.
gcc/testsuite/ChangeLog:
* g++.dg/modules/pr114856.h: New test.
* g++.dg/modules/pr114856_a.H: New test.
* g++.dg/modules/pr114856_b.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>
Reviewed-by: Patrick Palka <ppalka@redhat.com>
After r7-987-gf17a223de829cb, the access for the elements of a vector type would lose the qualifiers.
So if we had `constvector[0]`, the type of the element of the array would not have const on it.
This was due to a missing build_qualified_type for the inner type of the vector when building the array type.
We need to add back the call to build_qualified_type and now the access has the correct qualifiers. So the
overloads and even if it is a lvalue or rvalue is correctly done.
Note we correctly now reject the testcase gcc.dg/pr83415.c which was incorrectly accepted after r7-987-gf17a223de829cb.
Built and tested for aarch64-linux-gnu.
PR c++/89224
gcc/c-family/ChangeLog:
* c-common.cc (convert_vector_to_array_for_subscript): Call build_qualified_type
for the inner type.
gcc/cp/ChangeLog:
* constexpr.cc (cxx_eval_array_reference): Compare main variants
for the vector/array types instead of the types directly.
gcc/testsuite/ChangeLog:
* g++.dg/torture/vector-subaccess-1.C: New test.
* gcc.dg/pr83415.c: Change warning to error.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
In C++ sometimes you have a deconstructor function which is "empty", like for an
example with unions or with arrays. The front-end might not know it is empty either
so this should be done on during optimization.o
To implement it I added it to DCE where we mark if a statement is necessary or not.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
Changes since v1:
* v2: Add support for __aeabi_atexit for arm-*eabi. Add extra comments.
Add cxa_atexit-5.C testcase for -fPIC case.
* v3: Fix testcases for the __aeabi_atexit (forgot to do in the v2).
PR tree-optimization/19661
gcc/ChangeLog:
* tree-ssa-dce.cc (is_cxa_atexit): New function.
(is_removable_cxa_atexit_call): New function.
(mark_stmt_if_obviously_necessary): Don't mark removable
cxa_at_exit calls.
(mark_all_reaching_defs_necessary_1): Likewise.
(propagate_necessity): Likewise.
gcc/testsuite/ChangeLog:
* g++.dg/tree-ssa/cxa_atexit-1.C: New test.
* g++.dg/tree-ssa/cxa_atexit-2.C: New test.
* g++.dg/tree-ssa/cxa_atexit-3.C: New test.
* g++.dg/tree-ssa/cxa_atexit-4.C: New test.
* g++.dg/tree-ssa/cxa_atexit-5.C: New test.
* g++.dg/tree-ssa/cxa_atexit-6.C: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
This adds a few more of what is currently done in phiopt's value_replacement
to match. I noticed this when I was hooking up phiopt's value_replacement
code to use match and disabling the old code. But this can be done
independently from the hooking up phiopt's value_replacement as phiopt
is already hooked up for simplified versions already.
/* a != 0 ? a / b : 0 -> a / b iff b is nonzero. */
/* a != 0 ? a * b : 0 -> a * b */
/* a != 0 ? a & b : 0 -> a & b */
We prefer the `cond ? a : 0` forms to allow optimization of `a * cond` which
uses that form.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR tree-optimization/114894
gcc/ChangeLog:
* match.pd (`a != 0 ? a / b : 0`): New pattern.
(`a != 0 ? a * b : 0`): New pattern.
(`a != 0 ? a & b : 0`): New pattern.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/phi-opt-value-5.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Per quick email exchange with Palmer. Given the triviality, I'm just pushing
it.
gcc/
* config/riscv/riscv.cc (generic_ooo_tune_info): Turn on
overlap_op_by_pieces.
This is almost exclusively work from the VRULL team.
As we've discussed in the Tuesday meeting in the past, we'd like to have a knob
in the tuning structure to indicate that overlapped stores during
move_by_pieces expansion of memcpy & friends are acceptable.
This patch adds the that capability in our tuning structure. It's off for all
the uarchs upstream, but we have been using it inside Ventana for our uarch
with success. So technically it's NFC upstream, but puts in the infrastructure
multiple organizations likely need.
gcc/
* config/riscv/riscv.cc (struct riscv_tune_param): Add new
"overlap_op_by_pieces" field.
(rocket_tune_info, sifive_7_tune_info): Set it.
(sifive_p400_tune_info, sifive_p600_tune_info): Likewise.
(thead_c906_tune_info, xiangshan_nanhu_tune_info): Likewise.
(generic_ooo_tune_info, optimize_size_tune_info): Likewise.
(riscv_overlap_op_by_pieces): New function.
(TARGET_OVERLAP_OP_BY_PIECES_P): define.
gcc/testsuite/
* gcc.target/riscv/memcpy-nonoverlapping.c: New test.
* gcc.target/riscv/memset-nonoverlapping.c: New test.
The following patch imeplements the C++26 P2893R3 - Variadic friends
paper. The paper allows for the friend type declarations to specify
more than one friend type specifier and allows to specify ... at
the end of each. The patch doesn't introduce tentative parsing of
friend-type-declaration non-terminal, but rather just extends existing
parsing where it is a friend declaration which ends with ; after the
declaration specifiers to the cases where it ends with ...; or , or ...,
In that case it pedwarns for cxx_dialect < cxx26, handles the ... and
if there is , continues in a loop to parse the further friend type
specifiers.
2024-05-07 Jakub Jelinek <jakub@redhat.com>
PR c++/114459
gcc/c-family/
* c-cppbuiltin.cc (c_cpp_builtins): Predefine
__cpp_variadic_friend=202403L for C++26.
gcc/cp/
* parser.cc (cp_parser_member_declaration): Implement C++26
P2893R3 - Variadic friends. Parse friend type declarations
with ... or with more than one friend type specifier.
* friend.cc (make_friend_class): Allow TYPE_PACK_EXPANSION.
* pt.cc (instantiate_class_template): Handle PACK_EXPANSION_P
in friend classes.
gcc/testsuite/
* g++.dg/cpp26/feat-cxx26.C (__cpp_variadic_friend): Add test.
* g++.dg/cpp26/variadic-friend1.C: New test.
The HF and BF modes have the same size/precision and neither is
a subset nor superset of the other.
So, using either __extendhfbf2 or __trunchfbf2 is weird.
The expansion apparently emits __extendhfbf2, but on the libgcc side
we apparently have __trunchfbf2 implemented.
I think it is easier to switch to using what is available rather than
adding new entrypoints to libgcc, even alias, because this is backportable.
2024-05-07 Jakub Jelinek <jakub@redhat.com>
PR middle-end/114907
* expr.cc (convert_mode_scalar): Use trunc_optab rather than
sext_optab for HF->BF conversions.
* optabs-libfuncs.cc (gen_trunc_conv_libfunc): Likewise.
* gcc.dg/pr114907.c: New test.
In r9-5742 we've started allowing to inline always_inline functions into
functions which have disabled e.g. address sanitization even when the
always_inline function is implicitly from command line options sanitized.
This mostly works fine because most of the asan instrumentation is done only
late after ipa, but as the following testcase the .ASAN_MARK ifn calls
gimplifier adds can result in ICEs.
Fixed by dropping those during inlining, similarly to how we drop
.TSAN_FUNC_EXIT calls.
2024-05-07 Jakub Jelinek <jakub@redhat.com>
PR sanitizer/114956
* tree-inline.cc: Include asan.h.
(copy_bb): Remove also .ASAN_MARK calls if id->dst_fn has asan/hwasan
sanitization disabled.
* gcc.dg/asan/pr114956.c: New test.
This bug fix corrects the test codes below by converting the constant
literals to the type required by C. In the testcases below the values, 1
etc were converted into the INTEGER type before being passed to a C
vararg function. By default in modula2 constant literal ordinals are
represented as the ZTYPE (the largest GCC integer type node).
gcc/testsuite/ChangeLog:
PR modula2/114133
* gm2/extensions/run/pass/callingc10.mod: Convert constant
literal numbers into INTEGER.
* gm2/extensions/run/pass/callingc11.mod: Ditto.
* gm2/extensions/run/pass/vararg2.mod: Ditto.
* gm2/iso/run/pass/packed.mod: Emit a printf as a runtime
diagnostic.
Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
So with Chrstoph's patches from late 2022 we've had the ability to inline
strlen, and str[n]cmp (scalar). However, we never actually turned this
capability on by default!
This patch flips the those default to allow inlinining by default. It also
fixes one bug exposed by our internal testing when NBYTES is zero for strncmp.
I don't think that case happens enough to try and optimize it, we just disable
inline expansion for that instance.
This has been bootstrapped and regression tested on rv64gc at various times as
well as cross tested on rv64gc more times than I can probably count (we've have
this patch internally for a while). More importantly, I just successfully
tested it on rv64gc and rv32gcv elf configurations with the trunk
gcc/
* config/riscv/riscv-string.cc (riscv_expand_strcmp): Do not inline
strncmp with zero size.
(emit_strcmp_scalar_compare_subword): Adjust rotation for rv32 vs rv64.
* config/riscv/riscv.opt (var_inline_strcmp): Enable by default.
(vriscv_inline_strncmp, riscv_inline_strlen): Likewise.
gcc/testsuite
* gcc.target/riscv/zbb-strlen-disabled-2.c: Turn off inlining.
Reuse MinGW definitions from i386 for libgcc. Move reused files to
libgcc/config/mingw folder.
libgcc/ChangeLog:
* config.host: Add aarch64-w64-mingw32 target. Adjust targets
after moving MinGW files.
* config/i386/t-gthr-win32: Move to...
* config/mingw/t-gthr-win32: ...here.
* config/i386/t-mingw-pthread: Move to...
* config/mingw/t-mingw-pthread: ...here.
* config/aarch64/t-no-eh: New file. EH is not yet implemented for
the target, and the default definition should be disabled.
Rename "x86 Windows Options" to "Cygwin and MinGW Options".
It will be used also for AArch64.
gcc/ChangeLog:
* config/i386/mingw-w64.opt.urls: Rename options' name and
regenerate option URLs.
* config/lynx.opt.urls: Likewise.
* config/mingw/cygming.opt.urls: Likewise.
* config/mingw/mingw.opt.urls: Likewise.
* doc/invoke.texi: Likewise.
SEH is not enabled in aarch64-w64-mingw32 target yet. However, it is
needed to be declared in machine_function for reusing winnt.cc.
gcc/ChangeLog:
* config/aarch64/aarch64.h (struct seh_frame_state): Declare SEH
structure in machine_function.
(GTY): Add SEH field.