Commit dd9e5f4db2 changed -march=native to
treat it as -mcpu=native if no other mcpu or mtune option was given.
It would make sense to document this, especially if we try to persuade
compilers like LLVM to take the same approach.
This patch documents that behaviour.
Bootstrapped and tested on aarch64-none-linux-gnu.
Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
gcc/ChangeLog:
* doc/invoke.texi (AArch64 Options): Document rewriting of
-march=native to -mcpu=native.
The musttail error messages are reported to the user, so must be
translated.
gcc/ChangeLog:
PR c/83324
* calls.cc (initialize_argument_information): Mark messages
for translation.
(can_implement_as_sibling_call_p): Dito.
(expand_call): Dito.
When musttail is set, make tree-tailcall give error messages
when it cannot handle a call. This avoids vague "other reasons"
error messages later at expand time when it sees a musttail
function not marked tail call.
In various cases this requires delaying the error until
the call is discovered.
Also print more information on the failure to the dump file.
gcc/ChangeLog:
PR c/83324
* tree-tailcall.cc (maybe_error_musttail): New function.
(suitable_for_tail_opt_p): Report error reason.
(suitable_for_tail_call_opt_p): Report error reason.
(find_tail_calls): Accept basic blocks with abnormal edges.
Delay reporting of errors until the call is discovered.
Move top level suitability checks to here.
(tree_optimize_tail_calls_1): Remove top level checks.
Enable the tailcall optimization for non optimizing builds,
but in this case only checks calls that have the musttail attribute set.
This makes musttail work without optimization.
This is done with a new late musttail pass that is only active when
not optimizing. The new pass relies on tree-cfg to discover musttails.
This avoids a ~0.8% compiler run time penalty at -O0.
gcc/ChangeLog:
PR c/83324
* function.h (struct function): Add has_musttail.
* lto-streamer-in.cc (input_struct_function_base): Stream
has_musttail.
* lto-streamer-out.cc (output_struct_function_base): Dito.
* passes.def (pass_musttail): Add.
* tree-cfg.cc (notice_special_calls): Record has_musttail.
(clear_special_calls): Clear has_musttail.
* tree-pass.h (make_pass_musttail): Add.
* tree-tailcall.cc (find_tail_calls): Handle only_musttail
argument.
(tree_optimize_tail_calls_1): Pass on only_musttail.
(execute_tail_calls): Pass only_musttail as false.
(class pass_musttail): Add.
(make_pass_musttail): Add.
Some of the cfg fixups in pro_and_epilogue for sibcalls were dependent on "optimize".
Make them check cfun->tail_call_marked instead to handle the -O0 musttail
case. This fixes the musttail test cases on arm targets.
gcc/ChangeLog:
PR target/115255
* function.cc (thread_prologue_and_epilogue_insns): Check
cfun->tail_call_marked for sibcalls too.
(rest_of_handle_thread_prologue_and_epilogue): Dito.
- Give error messages for all causes of non sibling call generation
- When giving error messages clear the musttail flag to avoid ICEs
- Error out when tree-tailcall failed to mark a must-tail call
sibcall. In this case it doesn't know the true reason and only gives
a vague message.
gcc/ChangeLog:
PR c/83324
* calls.cc (maybe_complain_about_tail_call): Clear must tail
flag on error.
(expand_call): Give error messages for all musttail failures.
While lazy loading, instantiation of pendings can sometimes recursively
perform name lookup and begin further lazy loading. When using the
'-ftime-report' functionality this causes ICEs as we could start an
already-running timer for the importing.
This patch fixes the issue by using the 'timevar_cond*' API instead to
support such recursive calls.
PR c++/115165
gcc/cp/ChangeLog:
* module.cc (lazy_load_binding): Use 'timevar_cond*' APIs.
(lazy_load_pendings): Likewise.
gcc/testsuite/ChangeLog:
* g++.dg/modules/timevar-1_a.H: New test.
* g++.dg/modules/timevar-1_b.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
When partially instantiating a previously declared hidden template
friend definition (at class template scope) such as slot_allocated in
the first testcase below, tsubst_friend_function needs to go through
all existing specializations thereof and make them point to the new
definition.
But when the previous declaration was also at class template scope,
old_decl is not the most general template, instead it's the partial
instantiation, and since instantiations are relative to the most general
template, old_decl's DECL_TEMPLATE_INSTANTIATIONS is empty. So we
to consistently use the most general template here. And when adjusting
DECL_TI_ARGS to match, only the innermost template arguments should be
preserved; the outer ones should correspond to the new definition.
Otherwise we fail a checking-only sanity check in instantiate_decl in
the first testcase, and in the second/third we end up emitting multiple
definitions of the template friend instantiation, resulting in a link
failure.
PR c++/112288
gcc/cp/ChangeLog:
* pt.cc (tsubst_friend_function): When adjusting existing
specializations after defining a previously declared template
friend, consider the most general template and correct
DECL_TI_ARGS adjustment.
gcc/testsuite/ChangeLog:
* g++.dg/template/friend80.C: New test.
* g++.dg/template/friend81.C: New test.
* g++.dg/template/friend81a.C: New test.
Reviewed-by: Jason Merrill <jason@redhat.com>
Here we're neglecting to issue a -Wunused-value warning for suitable !
operator expressions, and in turn for != operator expressions that are
rewritten as !(x == y), only because we don't call warn_if_unused_value
on TRUTH_NOT_EXPR since its class is tcc_expression. This patch makes
us also consider warning for TRUTH_NOT_EXPR and also for ADDR_EXPR.
PR c++/114104
gcc/cp/ChangeLog:
* cvt.cc (convert_to_void): Call warn_if_unused_value for
TRUTH_NOT_EXPR and ADDR_EXPR as well.
gcc/testsuite/ChangeLog:
* g++.dg/warn/Wunused-20.C: New test.
Reviewed-by: Jason Merrill <jason@redhat.com>
When the scope of a qualified name is the current instantiation, and
qualified lookup finds nothing at template definition time, then we
know it'll find nothing at instantiation time (unless the current
instantiation has dependent bases). So such qualified name lookup
failure can be diagnosed ahead of time as per [temp.res.general]/6.
This patch implements that, for qualified names of the form (where
the current instantiation is A<T>):
this->non_existent
a.non_existent
A::non_existent
typename A::non_existent
It turns out we already optimistically attempt qualified lookup of
seemingly every qualified name, even when it's dependently scoped, and
then suppress issuing a lookup failure diagnostic after the fact.
So implementing this is mostly a matter of restricting the diagnostic
suppression to "dependentish" scopes (i.e. dependent scopes or the
current instantiation with dependent bases), rather than suppressing
for any dependently-typed scope as we currently do.
The cp_parser_conversion_function_id change is needed to avoid regressing
lookup/using8.C:
using A<T>::operator typename A<T>::Nested*;
When looking up A<T>::Nested we consider it not dependently scoped since
we entered A<T> from cp_parser_conversion_function_id earlier. But this
A<T> is the implicit instantiation A<T> not the primary template type A<T>,
and so the lookup fails which we now diagnose. This patch works around
this by not entering the template scope of a qualified conversion
function-id in this case, i.e. if we're in an expression vs declaration
context, by seeing if the type already went through finish_template_type
with entering_scope=true.
gcc/cp/ChangeLog:
* decl.cc (make_typename_type): Restrict name lookup failure
punting to dependentish_scope_p instead of dependent_type_p.
* error.cc (qualified_name_lookup_error): Improve diagnostic
when the scope is the current instantiation.
* parser.cc (cp_parser_diagnose_invalid_type_name): Likewise.
(cp_parser_conversion_function_id): Don't call push_scope on
a template scope unless we're in a declaration context.
(cp_parser_lookup_name): Restrict name lookup failure
punting to dependentish_scope_p instead of depedent_type_p.
* semantics.cc (finish_id_expression_1): Likewise.
* typeck.cc (finish_class_member_access_expr): Likewise.
libstdc++-v3/ChangeLog:
* include/experimental/socket
(basic_socket_iostream::basic_socket_iostream): Fix typo.
* include/tr2/dynamic_bitset
(__dynamic_bitset_base::_M_is_proper_subset_of): Likewise.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/alignas18.C: Expect name lookup error for U::X.
* g++.dg/cpp0x/forw_enum13.C: Expect name lookup error for
D3::A and D4<T>::A.
* g++.dg/parse/access13.C: Declare A::E::V to avoid name lookup
failure and preserve intent of the test.
* g++.dg/parse/enum11.C: Expect extra errors, matching the
non-template case.
* g++.dg/template/crash123.C: Avoid name lookup failure to
preserve intent of the test.
* g++.dg/template/crash124.C: Likewise.
* g++.dg/template/crash7.C: Adjust expected diagnostics.
* g++.dg/template/dtor6.C: Declare A::~A() to avoid name lookup
failure and preserve intent of the test.
* g++.dg/template/error22.C: Adjust expected diagnostics.
* g++.dg/template/static30.C: Avoid name lookup failure to
preserve intent of the test.
* g++.old-deja/g++.other/decl5.C: Adjust expected diagnostics.
* g++.dg/template/non-dependent34.C: New test.
Reviewed-by: Jason Merrill <jason@redhat.com>
The tests FAIL on i686-linux due to unexpected -Wpsabi diagnostics.
Fixed as usually by adding -Wno-psabi to dg-options.
2024-07-17 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/111150
* gcc.dg/tree-ssa/pr111150.c: Add -Wno-psabi to dg-options.
* g++.dg/tree-ssa/pr111150.C: Likewise.
lmap was introduced in tcl 8.6, and while it was released in 2012, lmap
does not really make too much of a difference to warrant the friction on
consverative (and relevant) systems.
gcc/testsuite/ChangeLog:
* lib/gcov.exp: Use foreach, not lmap, for tcl <= 8.5 compat.
In this PR, canonicalize_move_range walked off the end of a list
and triggered a null dereference. There are multiple ways of fixing
that, but I think the approach taken in the patch should be
relatively efficient.
gcc/
PR rtl-optimization/115929
* rtl-ssa/movement.h (canonicalize_move_range): Check for null prev
and next insns and create an invalid move range for them.
gcc/testsuite/
PR rtl-optimization/115929
* gcc.dg/torture/pr115929-2.c: New test.
One of the goals of the rtl-ssa representation was to allow a
group of consecutive clobbers to be skipped in constant time,
with amortised sublinear insertion and deletion. This involves
putting consecutive clobbers in groups. Splitting or joining
groups would be linear if we had to update every clobber on
each update, so the operation to query a clobber's group is
lazy and (again) amortised sublinear.
This means that, when splitting a group into two, we cannot
reuse the old group for one side. We have to invalidate it,
so that the lazy clobber_info::group query can tell that something
has changed. The ICE in the PR came from failing to do that.
gcc/
PR rtl-optimization/115928
* rtl-ssa/accesses.h (clobber_group): Add a new constructor that
takes the first, last and root clobbers.
* rtl-ssa/internals.inl (clobber_group::clobber_group): Define it.
* rtl-ssa/accesses.cc (function_info::split_clobber_group): Use it.
Allocate a new group for both sides and invalidate the previous group.
(function_info::add_def): After calling split_clobber_group,
remove the old group from the splay tree.
gcc/testsuite/
PR rtl-optimization/115928
* gcc.dg/torture/pr115928.c: New test.
genattrtab printed an "enum" tag before references to attribute
enums, but that's redundant in C++. Removing it means that each
attribute type becomes a single token and can be easily stored
in the attr_desc structure.
gcc/
* genattrtab.cc (attr_desc::cxx_type): New field.
(write_attr_get, write_attr_value): Use it.
(gen_attr, find_attr, make_internal_attr): Initialize it,
dropping enum tags.
In r14-409, we started handling empty bases first in cxx_fold_indirect_ref_1
so that we don't need to recurse and waste time.
This caused a bogus "modifying a const object" error. I'm appending my
analysis from the PR, but basically, cxx_fold_indirect_ref now returns
a different object than before, and we mark the wrong thing as const,
but since we're initializing an empty object, we should avoid setting
the object constness.
~~
Pre-r14-409: we're evaluating the call to C::C(), which is in the body of
B::B(), which is the body of D::D(&d):
C::C ((struct C *) this, NON_LVALUE_EXPR <0>)
It's a ctor so we get here:
3118 /* Remember the object we are constructing or destructing. */
3119 tree new_obj = NULL_TREE;
3120 if (DECL_CONSTRUCTOR_P (fun) || DECL_DESTRUCTOR_P (fun))
3121 {
3122 /* In a cdtor, it should be the first `this' argument.
3123 At this point it has already been evaluated in the call
3124 to cxx_bind_parameters_in_call. */
3125 new_obj = TREE_VEC_ELT (new_call.bindings, 0);
new_obj=(struct C *) &d.D.2656
3126 new_obj = cxx_fold_indirect_ref (ctx, loc, DECL_CONTEXT (fun), new_obj);
new_obj=d.D.2656.D.2597
We proceed to evaluate the call, then we get here:
3317 /* At this point, the object's constructor will have run, so
3318 the object is no longer under construction, and its possible
3319 'const' semantics now apply. Make a note of this fact by
3320 marking the CONSTRUCTOR TREE_READONLY. */
3321 if (new_obj && DECL_CONSTRUCTOR_P (fun))
3322 cxx_set_object_constness (ctx, new_obj, /*readonly_p=*/true,
3323 non_constant_p, overflow_p);
new_obj is still d.D.2656.D.2597, its type is "C", cxx_set_object_constness
doesn't set anything as const. This is fine.
After r14-409: on line 3125, new_obj is (struct C *) &d.D.2656 as before,
but we go to cxx_fold_indirect_ref_1:
5739 if (is_empty_class (type)
5740 && CLASS_TYPE_P (optype)
5741 && lookup_base (optype, type, ba_any, NULL, tf_none, off))
5742 {
5743 if (empty_base)
5744 *empty_base = true;
5745 return op;
type is C, which is an empty class; optype is "const D", and C is a base of D.
So we return the VAR_DECL 'd'. Then we get to cxx_set_object_constness with
object=d, which is const, so we mark the constructor READONLY.
Then we're evaluating A::A() which has
((A*)this)->data = 0;
we evaluate the LHS to d.D.2656.a, for which the initializer is
{.D.2656={.a={.data=}}} which is TREE_READONLY and 'd' is const, so we think
we're modifying a const object and fail the constexpr evaluation.
PR c++/115900
gcc/cp/ChangeLog:
* constexpr.cc (cxx_eval_call_expression): Set new_obj to NULL_TREE
if cxx_fold_indirect_ref set empty_base to true.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/constexpr-init23.C: New test.
The C + F extention implies the zcf extension on rv32. Add missing zcf
extension for the rv32 target.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/target-attr-16.c: Update expected assembly
Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
This patch adds match pattern for `(a ? x : y) eq/ne (b ? x : y)`.
In forwprop1 pass, depending on the type of `a` and `b`, GCC produces
`vec_cond` or `cond_expr`. Based on the observation that `(x != y)` is
TRUE, the pattern can be optimized to produce `(a^b ? TRUE : FALSE)`.
The patch adds match pattern for a, b:
(a ? x : y) != (b ? x : y) --> (a^b) ? TRUE : FALSE
(a ? x : y) == (b ? x : y) --> (a^b) ? FALSE : TRUE
(a ? x : y) != (b ? y : x) --> (a^b) ? TRUE : FALSE
(a ? x : y) == (b ? y : x) --> (a^b) ? FALSE : TRUE
PR tree-optimization/111150
gcc/ChangeLog:
* match.pd (`(a ? x : y) eq/ne (b ? x : y)`): New pattern.
(`(a ? x : y) eq/ne (b ? y : x)`): New pattern.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/pr111150.c: New test.
* gcc.dg/tree-ssa/pr111150-1.c: New test.
* g++.dg/tree-ssa/pr111150.C: New test.
Signed-off-by: Eikansh Gupta <quic_eikagupt@quicinc.com>
Like r15-1610-gb6215065a5b143 (which adds one for late_combine),
adding one for ext_dce is useful to debug some issues with this pass.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* dbgcnt.def (ext_dce): New debug counter.
* ext-dce.cc (ext_dce_try_optimize_insn): Reject the insn
if the debug counter says so.
(ext_dce): Rename to ...
(ext_dce_execute): This.
(pass_ext_dce::execute): Update for the name of ext_dce.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Add missing "cannot_copy" attribute to instructions that have to
stay in 1-1 correspondence with another insn.
PR target/115526
gcc/ChangeLog:
* config/alpha/alpha.md (movdi_er_high_g): Add cannot_copy attribute.
(movdi_er_tlsgd): Ditto.
(movdi_er_tlsldm): Ditto.
(call_value_osf_<tls>): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/alpha/pr115526.c: New test.
The addition of -Wunterminated-string-initialization should have
regenerated the c.opt.urls file.
Fixes: 44c9403ed1 ("c, objc: Add -Wunterminated-string-initialization")
gcc/c-family/ChangeLog:
* c.opt.urls: Regenerate.
This patch introduces a new insn that works as an insn combine
pattern for
(plus:HI (zero_extend:HI (reg:QI))
(const_0mod256_operannd:HI))
which requires at most 2 instructions. When the input register operand
is already in HImode, the addhi3 printer only adds the hi8 part when
it sees a SYMBOL_REF or CONST aligned to at least 256 bytes.
(The CONST_INT case was already handled).
gcc/
PR target/90616
* config/avr/predicates.md (const_0mod256_operand): New predicate.
* config/avr/constraints.md (Cp8): New constraint.
* config/avr/avr.md (*aligned_add_symbol): New insn.
* config/avr/avr.cc (avr_out_plus_symbol) [HImode]:
When op2 is a multiple of 256, there is no need to add / subtract
the lo8 part.
(avr_rtx_costs_1) [PLUS && HImode]: Return expected costs for
new insn *aligned_add_symbol as it applies.
The following testcase ICEs on x86_64-linux, because we try to
gsi_insert_on_edge_immediate a statement on an edge which already has
statements queued with gsi_insert_on_edge, and the deferral has been
intentional so that we don't need to deal with cfg changes in between.
The following patch uses the delayed insertion as well.
2024-07-17 Jakub Jelinek <jakub@redhat.com>
PR middle-end/115887
* gimple-lower-bitint.cc (gimple_lower_bitint): Use gsi_insert_on_edge
instead of gsi_insert_on_edge_immediate and set edge_insertions to
true.
* gcc.dg/bitint-108.c: New test.
When not using .base64 directive, we emit for long sequences of zeros
.string "foobarbaz"
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
The following patch changes that to
.string "foobarbaz"
.zero 12
It keeps emitting .string "" if there is just one zero or two zeros where
the first one is preceded by non-zeros, so we can have
.string "foobarbaz"
.string ""
or
.base64 "VG8gYmUgb3Igbm90IHRvIGJlLCB0aGF0IGlzIHRoZSBxdWVzdGlvbg=="
.string ""
but not 2 .string "" in a row.
On a testcase I have with around 310440 0-255 unsigned char character
constants mostly derived from cc1plus start but with too long sequences of
0s which broke transformation to STRING_CST adjusted to have at most 126
consecutive 0s, I see:
1504498 bytes long assembly without this patch on i686-linux (without
.base64 support in binutils)
1155071 bytes long assembly with this patch on i686-linux (without .base64
support in binutils)
431390 bytes long assembly without this patch on x86_64-linux (with
.base64 support in binutils)
427593 bytes long assembly with this patch on x86_64-linux (with .base64
support in binutils)
All 4 assemble to identical *.o file when using x86_64-linux .base64
supporting gas, and the former 2 when using older x86_64-linux gas assemble
to identical content as well.
2024-07-17 Jakub Jelinek <jakub@redhat.com>
* varasm.cc (default_elf_asm_output_ascii): Use ASM_OUTPUT_SKIP instead
of 2 or more default_elf_asm_output_limited_string (f, "") calls and
adjust base64 heuristics correspondingly.
As shown in PR115936 SCEV and IVOPTS create an invalidate IV when the IV is
a pointer type:
ivtmp.39_65 = ivtmp.39_59 + 0B;
where the IVs are DI mode and the offset is a pointer.
This comes from this weird candidate:
Candidate 8:
Var befor: ivtmp.39_59
Var after: ivtmp.39_65
Incr POS: before exit test
IV struct:
Type: sizetype
Base: 0
Step: 0B
Biv: N
Overflowness wrto loop niter: No-overflow
This IV was always created just ended up not being used.
This is created by SCEV.
simple_iv_with_niters in the case where no CHREC is found creates an IV with
base == ev, offset == 0;
however in this case EV is a POINTER_PLUS_EXPR and so the type is a pointer.
it ends up creating an unusable expression.
gcc/ChangeLog:
PR tree-optimization/115936
* tree-scalar-evolution.cc (simple_iv_with_niters): Use sizetype for
pointers.
maybe_new_partial_specialization wasn't propagating TYPE_CONTEXT when
creating a new class type corresponding to a constrained partial spec,
which do_friend relies on via template_class_depth to distinguish a
template friend from a non-template friend, and so in the below testcase
we were incorrectly instantiating the non-template operator+ as if it
were a template leading to an ICE.
PR c++/111890
gcc/cp/ChangeLog:
* pt.cc (maybe_new_partial_specialization): Propagate TYPE_CONTEXT
to the newly created partial specialization.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-partial-spec15.C: New test.
Reviewed-by: Jason Merrill <jason@redhat.com>
Vector stmts number of an operation is calculated based on output vectype.
This is over-estimated for lane-reducing operation, which would cause vector
def/use mismatched when we want to support loop reduction mixed with lane-
reducing and normal operations. One solution is to refit lane-reducing
to make it behave like a normal one, by adding new pass-through copies to
fix possible def/use gap. And resultant superfluous statements could be
optimized away after vectorization. For example:
int sum = 1;
for (i)
{
sum += d0[i] * d1[i]; // dot-prod <vector(16) char>
}
The vector size is 128-bit,vectorization factor is 16. Reduction
statements would be transformed as:
vector<4> int sum_v0 = { 0, 0, 0, 1 };
vector<4> int sum_v1 = { 0, 0, 0, 0 };
vector<4> int sum_v2 = { 0, 0, 0, 0 };
vector<4> int sum_v3 = { 0, 0, 0, 0 };
for (i / 16)
{
sum_v0 = DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], sum_v0);
sum_v1 = sum_v1; // copy
sum_v2 = sum_v2; // copy
sum_v3 = sum_v3; // copy
}
sum_v = sum_v0 + sum_v1 + sum_v2 + sum_v3; // = sum_v0
2024-07-02 Feng Xue <fxue@os.amperecomputing.com>
gcc/
* tree-vect-loop.cc (vect_reduction_update_partial_vector_usage):
Calculate effective vector stmts number with generic
vect_get_num_copies.
(vect_transform_reduction): Insert copies for lane-reducing so as to
fix over-estimated vector stmts number.
(vect_transform_cycle_phi): Calculate vector PHI number only based on
output vectype.
* tree-vect-slp.cc (vect_slp_analyze_node_operations_1): Remove
adjustment on vector stmts number specific to slp reduction.
Extend original vect_get_num_copies (pure loop-based) to calculate number of
vector stmts for slp node regarding a generic vect region.
2024-07-12 Feng Xue <fxue@os.amperecomputing.com>
gcc/
* tree-vectorizer.h (vect_get_num_copies): New overload function.
* tree-vect-slp.cc (vect_slp_analyze_node_operations_1): Calculate
number of vector stmts for slp node with vect_get_num_copies.
(vect_slp_analyze_node_operations): Calculate number of vector elements
for constant/external slp node with vect_get_num_copies.
The following fixes how during reduction epilogue generation we
gather conditional compares for condition reductions, thereby
following the reduction chain via STMT_VINFO_REDUC_IDX. The issue
is that SLP nodes for COND_EXPRs can have either three or four
children dependent on whether we have legacy GENERIC expressions
in the transitional pattern GIMPLE for the COND_EXPR condition.
PR tree-optimization/115959
* tree-vect-loop.cc (vect_create_epilog_for_reduction):
Get at the REDUC_IDX child in a safer way for COND_EXPR
nodes.
* gcc.dg/vect/pr115959.c: New testcase.
This is another test which clearly has been written with the assumption that
it will be executed, but it isn't.
It works fine when it is executed on both x86_64-linux and i686-linux.
2024-07-17 Jakub Jelinek <jakub@redhat.com>
* c-c++-common/torture/builtin-convertvector-1.c: Add dg-do run
directive.
Apparently there is a -Wsign-compare warning if ptrdiff_t has precision of
int, then (t - s + 1 + 2) / 3 * 4 has int type while cnt unsigned int.
This doesn't warn if ptrdiff_t has larger precision, say on x86_64
it is 64-bit and so (t - s + 1 + 2) / 3 * 4 has long type and cnt unsigned
int. And it doesn't warn when using older binutils (in my tests I've
used new binutils on x86_64 and old binutils on i686).
Anyway, earlier condition guarantees that t - s is at most 256-ish and
t >= s by construction, so we can just cast it to (unsigned) to avoid
the warning.
2024-07-17 Jakub Jelinek <jakub@redhat.com>
PR other/115958
* varasm.cc (default_elf_asm_output_ascii): Cast t - s to unsigned
to avoid -Wsign-compare warnings.
The builtin-clear-padding-6.c testcase fails as clear_padding_type
doesn't correctly recompute the buf->size and buf->off members after
expanding clearing of an array using a runtime loop.
buf->size should be in that case the offset after which it should continue
with next members or padding before them modulo UNITS_PER_WORD and
buf->off that offset minus buf->size. That is what the code was doing,
but with off being the start of the loop cleared array, not its end.
So, the last hunk in gimple-fold.cc fixes that.
When adding the testcase, I've noticed that the
c-c++-common/torture/builtin-clear-padding-* tests, although clearly
written as runtime tests to test the builtins at runtime, didn't have
{ dg-do run } directive and were just compile tests because of that.
When adding that to the tests, builtin-clear-padding-1.c was already
failing without that clear_padding_type hunk too, but
builtin-clear-padding-5.c was still failing even after the change.
That is due to a bug in clear_padding_flush which the patch fixes as
well - when clear_padding_flush is called with full=true (that happens
at the end of the whole __builtin_clear_padding or on those array
padding clears done by a runtime loop), it wants to flush all the pending
padding clearings rather than just some. If it is at the end of the whole
object, it decreases wordsize when needed to make sure the code never writes
including RMW cycles to something outside of the object:
if ((unsigned HOST_WIDE_INT) (buf->off + i + wordsize)
> (unsigned HOST_WIDE_INT) buf->sz)
{
gcc_assert (wordsize > 1);
wordsize /= 2;
i -= wordsize;
continue;
}
but if it is full==true flush in the middle, this doesn't happen, but we
still process just the buffer bytes before the current end. If that end
is not on a wordsize boundary, e.g. on the builtin-clear-padding-5.c test
the last chunk is 2 bytes, '\0', '\xff', i is 16 and end is 18,
nonzero_last might be equal to the end - i, i.e. 2 here, but still all_ones
might be true, so in some spots we just didn't emit any clearing in that
last chunk.
2024-07-17 Jakub Jelinek <jakub@redhat.com>
PR middle-end/115527
* gimple-fold.cc (clear_padding_flush): Introduce endsize
variable and use it instead of wordsize when comparing it against
nonzero_last.
(clear_padding_type): Increment off by sz.
* c-c++-common/torture/builtin-clear-padding-1.c: Add dg-do run
directive.
* c-c++-common/torture/builtin-clear-padding-2.c: Likewise.
* c-c++-common/torture/builtin-clear-padding-3.c: Likewise.
* c-c++-common/torture/builtin-clear-padding-4.c: Likewise.
* c-c++-common/torture/builtin-clear-padding-5.c: Likewise.
* c-c++-common/torture/builtin-clear-padding-6.c: New test.
Currently for 128 bit floating-point ibm128 and ieee128
formats conversion, the corresponding libcalls are:
ibm128 -> ieee128 "__trunctfkf2"
ieee128 -> ibm128 "__extendkftf2"
, and generic code handling (like convert_mode_scalar) also
adopts sext_optab for ieee128 -> ibm128 while trunc_optab
for ibm128 -> ieee128. But in rs6000 port as function
rs6000_expand_float128_convert and init_float128_ieee show,
we adopt sext_optab for ibm128 -> ieee128 with "__trunctfkf2"
while trunc_optab for ieee128 -> ibm128 with "__extendkftf2".
To make them consistent and avoid some surprises, this patch
is to adjust rs6000 internal handlings by adopting trunc_optab
for ibm128 -> ieee128 with "__trunctfkf2" while sext_optab for
ieee128 -> ibm128 with "__extendkftf2".
gcc/ChangeLog:
* config/rs6000/rs6000.cc (init_float128_ieee): Use trunc_optab rather
than sext_optab for converting FLOAT128_IBM_P mode to FLOAT128_IEEE_P
mode, and use sext_optab rather than trunc_optab for converting
FLOAT128_IEEE_P mode to FLOAT128_IBM_P mode.
(rs6000_expand_float128_convert): Likewise.
The fix for PR112993 makes KFmode have 128 bit mode precision,
we don't need this workaround to fix up the type precision any
more, and just go with mode precision. So this patch is to
remove KFmode workaround.
PR target/112993
gcc/ChangeLog:
* tree.cc (build_common_tree_nodes): Drop the workaround for rs6000
KFmode precision adjustment.
This reverts commit r14-6478-gfda8e2f8292a90 "range:
Workaround different type precision between _Float128 and
long double [PR112788]" as the fixes for PR112993 make
all 128 bits scalar floating point have the same 128 bit
precision, this workaround isn't needed any more.
PR target/112993
gcc/ChangeLog:
* value-range.h (range_compatible_p): Remove the workaround on
different type precision between _Float128 and long double.
Previously effective target fortran_real_c_float128 never
passes on Power regardless of the default 128 long double
is ibmlongdouble or ieeelongdouble. It's due to that TF
mode is always used for kind 16 real, which has precision
127, while the node float128_type_node for c_float128 has
128 type precision, get_real_kind_from_node can't find a
matching as it only checks gfc_real_kinds[i].mode_precision
and type precision.
With changing TFmode/IFmode/KFmode to have the same mode
precision 128, now fortran_real_c_float12 can pass with
ieeelongdouble enabled by default and test cases guarded
with it get tested accordingly. But with ibmlongdouble
enabled by default, since TFmode has precision 128 which
is the same as type precision 128 of float128_type_node,
get_real_kind_from_node considers kind for TFmode matches
float128_type_node, but it's wrong as at this time point
TFmode is with ibm extended format. So this patch is to
teach get_real_kind_from_node to check one more field which
can be differentiable from the underlying real format, it
can avoid the unexpected matching when there more than one
modes have the same precisoin.
PR target/112993
gcc/fortran/ChangeLog:
* trans-types.cc (get_real_kind_from_node): Consider the case where
more than one modes have the same precision.
On rs6000, there are three 128 bit scalar floating point
modes TFmode, IFmode and KFmode. With some historical
reasons, we defines them with different mode precisions,
that is KFmode 126, TFmode 127 and IFmode 128. But in
fact all of them should have the same mode precision 128,
this special setting has caused some issues like some
unexpected failures mentioned in [1] and also made us have
to introduce some workarounds, such as: the workaround in
build_common_tree_nodes for KFmode 126, the workaround in
range_compatible_p for same mode but different precision
issue.
This patch is to make these three 128 bit scalar floating
point modes TFmode, IFmode and KFmode have 128 bit mode
precision, and keep the order same as previous in order
to make machine independent parts of the compiler not try
to widen IFmode to TFmode. Besides, build_common_tree_nodes
adopts the newly added hook mode_for_floating_type so we
don't need to worry about unexpected mode for long double
type node.
In function convert_mode_scalar, with the proposed change,
it adopts sext_optab for converting ieee128 format mode to
ibm128 format mode while trunc_optab for converting ibm128
format mode to ieee128 format mode. Thus this patch removes
useless extend and trunc optab supports, supplements new
define_expands expandkftf2 and trunctfkf2 to align with
convert_mode_scalar implementation. It also unnames two
define_insn_and_split to avoid conflicts and make them more
clear. Considering the current implementation that there is
no chance to have KF <-> IF conversion (since either of them
would be TF already), it adds two dummy define_expands to
assert this.
[1] https://inbox.sourceware.org/gcc-patches/
718677e7-614d-7977-312d-05a75e1fd5b4@linux.ibm.com/
PR target/112993
gcc/ChangeLog:
* config/rs6000/rs6000-modes.def (IFmode, KFmode, TFmode): Define
with FLOAT_MODE instead of FRACTIONAL_FLOAT_MODE, don't use special
precisions any more.
(rs6000-modes.h): Remove include.
* config/rs6000/rs6000-modes.h: Remove.
* config/rs6000/rs6000.h (rs6000-modes.h): Remove include.
* config/rs6000/t-rs6000: Remove rs6000-modes.h include.
* config/rs6000/rs6000.cc (rs6000_option_override_internal): Replace
all uses of FLOAT_PRECISION_TFmode with 128.
(rs6000_c_mode_for_floating_type): Likewise.
* config/rs6000/rs6000.md (define_expand extendiftf2): Remove.
(define_expand extendifkf2): Remove.
(define_expand extendtfkf2): Remove.
(define_expand trunckftf2): Remove.
(define_expand trunctfif2): Remove.
(define_expand extendtfif2): Add new assertion.
(define_expand expandkftf2): New.
(define_expand trunciftf2): Add new assertion.
(define_expand trunctfkf2): New.
(define_expand truncifkf2): Change with gcc_unreachable.
(define_expand expandkfif2): New.
(define_insn_and_split extendkftf2): Rename to ...
(define_insn_and_split *extendkftf2): ... this.
(define_insn_and_split trunctfkf2): Rename to ...
(define_insn_and_split *extendtfkf2): ... this.
With some historical reasons, rs6000 defines KFmode, TFmode
and IFmode to have different mode precisions, but it causes
some issues and needs some workarounds such as PR112993.
So we are going to make all rs6000 128 bit scalar FP modes
have 128 bit precision. Be prepared for that, this patch
is to make function convert_mode_scalar allow same precision
FP modes conversion if their underlying formats are
ibm_extended_format and ieee_quad_format respectively, just
like the existing special treatment on arm_bfloat_half_format
<-> ieee_half_format. It also factors out all the relevant
checks into a lambda function. Besides, similar to ieee fp16
-> bfloat conversion, it adopts trunc_optab rather than
sext_optab for ibm128 to ieee128 conversion.
PR target/112993
gcc/ChangeLog:
* expr.cc (convert_mode_scalar): Allow same precision conversion
between scalar floating point modes if whose underlying format is
ibm_extended_format or ieee_quad_format, and refactor assertion
with new lambda function acceptable_same_precision_modes. Use
trunc_optab rather than sext_optab for ibm128 to ieee128 conversion.
* optabs-libfuncs.cc (gen_trunc_conv_libfunc): Use trunc_optab rather
than sext_optab for ibm128 to ieee128 conversion.
We currently silently ignore the -mrop-protect option for old CPUs we don't
support with the ROP hash insns, but we throw an error for unsupported ABIs.
This patch treats unsupported CPUs and ABIs similarly by throwing an error
both both. This matches clang behavior and allows us to simplify our tests
in the code that generates our prologue and epilogue code.
2024-06-26 Peter Bergner <bergner@linux.ibm.com>
gcc/
PR target/114759
* config/rs6000/rs6000.cc (rs6000_option_override_internal): Disallow
CPUs and ABIs that do no support the ROP protection insns.
* config/rs6000/rs6000-logue.cc (rs6000_stack_info): Remove now
unneeded tests.
(rs6000_emit_prologue): Likewise.
Remove unneeded gcc_assert.
(rs6000_emit_epilogue): Likewise.
* config/rs6000/rs6000.md: Likewise.
gcc/testsuite/
PR target/114759
* gcc.target/powerpc/pr114759-3.c: New test.
We currently only emit the ROP-protect hash* insns for Power10, where the
insns were added to the architecture. We want to emit them for earlier
cpus (where they operate as NOPs), so that if those older binaries are
ever executed on a Power10, then they'll be protected from ROP attacks.
Binutils accepts hashst and hashchk back to Power8, so change GCC to emit
them for Power8 and later. This matches clang's behavior.
2024-06-19 Peter Bergner <bergner@linux.ibm.com>
gcc/
PR target/114759
* config/rs6000/rs6000-logue.cc (rs6000_stack_info): Use TARGET_POWER8.
(rs6000_emit_prologue): Likewise.
* config/rs6000/rs6000.md (hashchk): Likewise.
(hashst): Likewise.
Fix whitespace.
gcc/testsuite/
PR target/114759
* gcc.target/powerpc/pr114759-2.c: New test.
* lib/target-supports.exp (rop_ok): Use
check_effective_target_has_arch_pwr8.
When importing modules, when a binding vector for a name runs out of
slots it gets reallocated with a larger size, and existing bindings are
copied across. However, the flags to indicate whether deduping needs to
occur did not: this causes ICEs, as it allows a duplicate binding to be
added which then violates assumptions later on.
PR c++/99242
gcc/cp/ChangeLog:
* name-lookup.cc (append_imported_binding_slot): Propagate dups
flags.
gcc/testsuite/ChangeLog:
* g++.dg/modules/pr99242_a.H: New test.
* g++.dg/modules/pr99242_b.H: New test.
* g++.dg/modules/pr99242_c.H: New test.
* g++.dg/modules/pr99242_d.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>