Commit Graph

201823 Commits

Author SHA1 Message Date
Tejas Belagod
22d85c10a0 AArch64: [PR96339] Optimise svlast[ab]
This PR optimizes an SVE intrinsics sequence where
    svlasta (svptrue_pat_b8 (SV_VL1), x)
  a scalar is selected based on a constant predicate and a variable vector.
  This sequence is optimized to return the correspoding element of a NEON
  vector. For eg.
    svlasta (svptrue_pat_b8 (SV_VL1), x)
  returns
    umov    w0, v0.b[1]
  Likewise,
    svlastb (svptrue_pat_b8 (SV_VL1), x)
  returns
     umov    w0, v0.b[0]
  This optimization only works provided the constant predicate maps to a range
  that is within the bounds of a 128-bit NEON register.

gcc/ChangeLog:

	PR target/96339
	* config/aarch64/aarch64-sve-builtins-base.cc (svlast_impl::fold): Fold sve
	calls that have a constant input predicate vector.
	(svlast_impl::is_lasta): Query to check if intrinsic is svlasta.
	(svlast_impl::is_lastb): Query to check if intrinsic is svlastb.
	(svlast_impl::vect_all_same): Check if all vector elements are equal.

gcc/testsuite/ChangeLog:

	PR target/96339
	* gcc.target/aarch64/sve/acle/general-c/svlast.c: New.
	* gcc.target/aarch64/sve/acle/general-c/svlast128_run.c: New.
	* gcc.target/aarch64/sve/acle/general-c/svlast256_run.c: New.
	* gcc.target/aarch64/sve/pcs/return_4.c (caller_bf16): Fix asm
	to expect optimized code for function body.
	* gcc.target/aarch64/sve/pcs/return_4_128.c (caller_bf16): Likewise.
	* gcc.target/aarch64/sve/pcs/return_4_256.c (caller_bf16): Likewise.
	* gcc.target/aarch64/sve/pcs/return_4_512.c (caller_bf16): Likewise.
	* gcc.target/aarch64/sve/pcs/return_4_1024.c (caller_bf16): Likewise.
	* gcc.target/aarch64/sve/pcs/return_4_2048.c (caller_bf16): Likewise.
	* gcc.target/aarch64/sve/pcs/return_5.c (caller_bf16): Likewise.
	* gcc.target/aarch64/sve/pcs/return_5_128.c (caller_bf16): Likewise.
	* gcc.target/aarch64/sve/pcs/return_5_256.c (caller_bf16): Likewise.
	* gcc.target/aarch64/sve/pcs/return_5_512.c (caller_bf16): Likewise.
	* gcc.target/aarch64/sve/pcs/return_5_1024.c (caller_bf16): Likewise.
	* gcc.target/aarch64/sve/pcs/return_5_2048.c (caller_bf16): Likewise.
2023-06-13 07:57:36 +01:00
Andi Kleen
950fa8552b Update perf auto profile script
- Fix gen_autofdo_event: The download URL for the Intel Perfmon Event
  list has changed, as well as the JSON format.
  Also it now uses pattern matching to match CPUs. Update the script to support all of this.
- Regenerate gcc-auto-profile with the latest published Intel model
  numbers, so it works with recent systems.
- So far it's still broken on hybrid systems

contrib/ChangeLog:

	* gen_autofdo_event.py: Update for download server changes

gcc/ChangeLog

	* config/i386/gcc-auto-profile: Regenerate.
2023-06-12 19:22:22 -07:00
Juzhe-Zhong
de5f3bbf95 RISC-V: Fix V_WHOLE && V_FRACT iterator requirement
This patch fixes the requirement of V_WHOLE and V_FRACT.
E.g. VNx8QI in V_WHOLE has no requirement which is incorrect.
     Actually, VNx8QI should be whole(full) mode when TARGET_MIN_VLEN < 128
     since when TARGET_MIN_VLEN == 128, VNx8QI is e8mf2 which is fractional
     vector.

Co-Authored by: Robin Dapp <rdapp@ventanamicro.com>

gcc/ChangeLog:

	* config/riscv/vector-iterators.md: Fix requirement.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/vls-vlmax/full-vec-move1.c: New test.
2023-06-13 09:28:04 +08:00
Juzhe-Zhong
d150afb479 RISC-V: Enhance RVV VLA SLP auto-vectorization with decompress operation
According to RVV ISA:
https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc

We can enhance VLA SLP auto-vectorization with (16.5.1. Synthesizing vdecompress)
Decompress operation.

Case 1 (nunits = POLY_INT_CST [16, 16]):
_48 = VEC_PERM_EXPR <_37, _35, { 0, POLY_INT_CST [16, 16], 1, POLY_INT_CST [17, 16], 2, POLY_INT_CST [18, 16], ... }>;
We can optimize such VLA SLP permuation pattern into:
_48 = vdecompress (_37, _35, mask = { 0, 1, 0, 1, ... };

Case 2 (nunits = POLY_INT_CST [16, 16]):
_23 = VEC_PERM_EXPR <_46, _44, { POLY_INT_CST [1, 1], POLY_INT_CST [3, 3], POLY_INT_CST [2, 1], POLY_INT_CST [4, 3], POLY_INT_CST [3, 1], POLY_INT_CST [5, 3], ... }>;
We can optimize such VLA SLP permuation pattern into:
_48 = vdecompress (slidedown(_46, 1/2 nunits), slidedown(_44, 1/2 nunits), mask = { 0, 1, 0, 1, ... };

For example:
void __attribute__ ((noinline, noclone))
vec_slp (uint64_t *restrict a, uint64_t b, uint64_t c, int n)
{
  for (int i = 0; i < n; ++i)
    {
      a[i * 2] += b;
      a[i * 2 + 1] += c;
    }
}

ASM:
...
        vid.v   v0
        vand.vi v0,v0,1
        vmseq.vi        v0,v0,1  ===> mask = { 0, 1, 0, 1, ... }
vdecompress:
        viota.m v3,v0
        vrgather.vv     v2,v1,v3,v0.t
Loop:
        vsetvli zero,a5,e64,m1,ta,ma
        vle64.v v1,0(a0)
        vsetvli a6,zero,e64,m1,ta,ma
        vadd.vv v1,v2,v1
        vsetvli zero,a5,e64,m1,ta,ma
        mv      a5,a3
        vse64.v v1,0(a0)
        add     a3,a3,a1
        add     a0,a0,a2
        bgtu    a5,a4,.L4

gcc/ChangeLog:

	* config/riscv/riscv-v.cc (emit_vlmax_decompress_insn): New function.
	(shuffle_decompress_patterns): New function.
	(expand_vec_perm_const_1): Add decompress optimization.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/partial/slp-8.c: New test.
	* gcc.target/riscv/rvv/autovec/partial/slp-9.c: New test.
	* gcc.target/riscv/rvv/autovec/partial/slp_run-8.c: New test.
	* gcc.target/riscv/rvv/autovec/partial/slp_run-9.c: New test.
2023-06-13 09:10:33 +08:00
GCC Administrator
9d250bdb88 Daily bump. 2023-06-13 00:17:29 +00:00
Gaius Mulley
8089f26b94 PR modula2/110189 Using an unknown TYPE as argument to VAL gives ICE
This patch tidies P3Build.bnf and fixes error format specs in
M2Quads.mod when encountering unknown symbols.

gcc/m2/ChangeLog:

	PR modula2/110189
	* gm2-compiler/M2Quads.mod (BuildAbsFunction): Replace abort
	format specifier.
	(BuildValFunction): Replace abort format specifier.
	(BuildCastFunction): Replace abort format specifier.
	(BuildMinFunction): Replace abort format specifier.
	(BuildMaxFunction): Replace abort format specifier.
	(BuildTruncFunction): Replace abort format specifier.
	* gm2-compiler/P3Build.bnf (Pass1): Remove.
	(Pass2): Remove.
	(Pass3): Remove.
	(Expect): Add Pass1.
	(AsmStatement): Remove Pass3.
	(AsmOperands): Remove Pass3.
	(AsmOperandSpec): Remove Pass3.
	(AsmInputElement): Remove Pass3.
	(AsmOutputElement): Remove Pass3.
	(AsmTrashList): Remove Pass3.

gcc/testsuite/ChangeLog:

	PR modula2/110189
	* gm2/pim/fail/foovaltype.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
2023-06-13 00:53:53 +01:00
Jeff Law
ae193f9008 [committed] [PR rtl-optimization/101188] Fix reload_cse_move2add ignoring clobbers
So as Georg-Johann discusses in the BZ, reload_cse_move2add can generate
 incorrect code when optimizing code with clobbers.  Specifically in the
case where we try to optimize a sequence of 4 operations down to 3
operations we can reset INSN to the next instruction and continue the loop.

That skips the code to invalidate objects based on things like REG_INC
nodes, stack pushes and most importantly clobbers attached to the current
insn.

This patch factors all of the invalidation code used by reload_cse_move2add
into a new function and calls it at the appropriate time.

Georg-Johann has confirmed this patch fixes his avr bug and I've had it in
my tester over the weekend.  It's bootstrapped and regression tested on
aarch64, m68k, sh4, alpha and hppa.  It's also regression tested successfully
on a wide variety of other targets.

gcc/
	PR rtl-optimization/101188
	* postreload.cc (reload_cse_move2add_invalidate): New function,
	extracted from...
	(reload_cse_move2add): Call reload_cse_move2add_invalidate.

gcc/testsuite
	PR rtl-optimization/101188
	* gcc.c-torture/execute/pr101188.c: New test
2023-06-12 12:52:10 -06:00
Prathamesh Kulkarni
9eb757d117 [aarch64] Improve code-gen for vector initialization with single constant element.
gcc/ChangeLog:
	* config/aarch64/aarch64.cc (aarch64_expand_vector_init): Tweak condition
	if (n_var == n_elts && n_elts <= 16) to allow a single constant,
	and if maxv == 1, use constant element for duplicating into register.

gcc/testsuite/ChangeLog:
	* gcc.target/aarch64/vec-init-single-const.c: New test.
	* gcc.target/aarch64/vec-init-single-const-be.c: Likewise.
	* gcc.target/aarch64/vec-init-single-const-2.c: Likewise.
2023-06-12 23:18:40 +05:30
Tobias Burnus
38944ec2a6 OpenMP: Cleanups related to the 'present' modifier
Reduce number of enum values passed to libgomp as
GOMP_MAP_PRESENT_{TO,TOFROM,FROM,ALLOC} have the same semantic as
GOMP_MAP_FORCE_PRESENT (i.e. abort if not present, otherwise ignore);
that's different to GOMP_MAP_ALWAYS_PRESENT_{TO,TOFROM,FROM} which also
abort if not present but copy data when present. This is is a follow-up to
the commit r14-1579-g4ede915d5dde93 done 6 days ago.

Additionally, the commit improves a libgomp run-time and a C/C++ compile-time
error wording and extends testcases a tiny bit.

gcc/c/ChangeLog:

	* c-parser.cc (c_parser_omp_clause_map): Reword error message for
	clearness especially with 'omp target (enter/exit) data.'

gcc/cp/ChangeLog:

	* parser.cc (cp_parser_omp_clause_map): Reword error message for
	clearness especially with 'omp target (enter/exit) data.'
	* semantics.cc (handle_omp_array_sections): Handle
	GOMP_MAP_{ALWAYS_,}PRESENT_{TO,TOFROM,FROM,ALLOC} enum values.

gcc/ChangeLog:

	* gimplify.cc (gimplify_adjust_omp_clauses_1): Use
	GOMP_MAP_FORCE_PRESENT for 'present alloc' implicit mapping.
	(gimplify_adjust_omp_clauses): Change
	GOMP_MAP_PRESENT_{TO,TOFROM,FROM,ALLOC} to the equivalent
	GOMP_MAP_FORCE_PRESENT.
	* omp-low.cc (lower_omp_target): Remove handling of no-longer valid
	GOMP_MAP_PRESENT_{TO,TOFROM,FROM,ALLOC}; update map kinds used for
	to/from clauses with present modifier.

include/ChangeLog:

	* gomp-constants.h (enum gomp_map_kind): Change the enum values
	GOMP_MAP_PRESENT_{TO,TOFROM,FROM,ALLOC} to be compiler only.
	(GOMP_MAP_PRESENT_P): Update to include also GOMP_MAP_FORCE_PRESENT.

libgomp/ChangeLog:

	* target.c (gomp_to_device_kind_p, gomp_map_vars_internal): Replace
	GOMP_MAP_PRESENT_{FROM,TO,TOFROM,ACLLOC} by GOMP_MAP_FORCE_PRESENT.
	(gomp_map_vars_internal, gomp_update): Likewise; unify and improve
	error message.
	* testsuite/libgomp.c-c++-common/target-present-2.c: Update for
	changed error message.
	* testsuite/libgomp.fortran/target-present-1.f90: Likewise.
	* testsuite/libgomp.fortran/target-present-2.f90: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/present-1.c: Likewise.
	* testsuite/libgomp.c-c++-common/target-present-1.c: Likewise and
	extend testcase to check that data is copied when needed.
	* testsuite/libgomp.c-c++-common/target-present-3.c: Likewise.
	* testsuite/libgomp.fortran/target-present-3.f90: Likewise.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/defaultmap-4.c: Update scan-tree-dump.
	* c-c++-common/gomp/map-9.c: Likewise.
	* gfortran.dg/gomp/defaultmap-8.f90: Likewise.
	* gfortran.dg/gomp/map-11.f90: Likewise.
	* gfortran.dg/gomp/target-update-1.f90: Likewise.
	* gfortran.dg/gomp/map-12.f90: Likewise; also check original dump.
	* c-c++-common/gomp/map-6.c: Update dg-error and also check
	clause error with 'target (enter/exit) data'.
2023-06-12 18:15:28 +02:00
Andrew MacLeod
0ddc8c7871 Add some overrides.
PR tree-optimization/110205
	* range-op-float.cc (range_operator::fold_range): Add default FII
	fold routine.
	* range-op-mixed.h (class operator_gt): Add missing final overrides.
	* range-op.cc (range_op_handler::fold_range): Add RO_FII case.
	(operator_lshift ::update_bitmask): Add final override.
	(operator_rshift ::update_bitmask): Add final override.
	* range-op.h (range_operator::fold_range): Add FII prototype.
2023-06-12 11:30:51 -04:00
Andrew MacLeod
5410b07a8c Provide interface for non-standard operators.
THis removes the hack introduced for WIDEN_MULT which exported a pointer
to the operator and the gimple-range-op.cc set the operator to this
pointer whenn it was appropriate.

Instead, we simple change the range-op table to be unsigned indexed,
and add new opcodes to the end of the table, allowing them to be indexed
directly via range_op_handler::range_op.

	* gimple-range-op.cc (gimple_range_op_handler::maybe_non_standard):
	Use range_op_handler directly.
	* range-op.cc (range_op_handler::range_op_handler): Unsigned
	param instead of tree-code.
	(ptr_op_widen_plus_signed): Delete.
	(ptr_op_widen_plus_unsigned): Delete.
	(ptr_op_widen_mult_signed): Delete.
	(ptr_op_widen_mult_unsigned): Delete.
	(range_op_table::initialize_integral_ops): Add new opcodes.
	* range-op.h (range_op_handler): Use unsigned.
	(OP_WIDEN_MULT_SIGNED): New.
	(OP_WIDEN_MULT_UNSIGNED): New.
	(OP_WIDEN_PLUS_SIGNED): New.
	(OP_WIDEN_PLUS_UNSIGNED): New.
	(RANGE_OP_TABLE_SIZE): New.
	(range_op_table::operator []): Use unsigned.
	(range_op_table::set): Use unsigned.
	(m_range_tree): Make unsigned.
	(ptr_op_widen_mult_signed): Remove.
	(ptr_op_widen_mult_unsigned): Remove.
	(ptr_op_widen_plus_signed): Remove.
	(ptr_op_widen_plus_unsigned): Remove.
2023-06-12 10:51:09 -04:00
Andrew MacLeod
1b1de36ac8 Provide a default range_operator via range_op_handler.
range_op_handler now provides a default range_operator for any opcode,
so there is no longer a need to check for a valid operator.

	* gimple-range-op.cc (gimple_range_op_handler): Set m_operator
	manually as there is no access to the default operator.
	(cfn_copysign::fold_range): Don't check for validity.
	(cfn_ubsan::fold_range): Ditto.
	(gimple_range_op_handler::maybe_builtin_call): Don't set to NULL.
	* range-op.cc (default_operator): New.
	(range_op_handler::range_op_handler): Use default_operator
	instead of NULL.
	(range_op_handler::operator bool): Move from header, compare
	against default operator.
	(range_op_handler::range_op): New.
	* range-op.h (range_op_handler::operator bool): Move.
2023-06-12 10:51:06 -04:00
Andrew MacLeod
1c0aae69a7 Switch from unified table to range_op_table. There can be only one.
Now that there is only a single range_op_table, make the base table the
only table.

	* range-op.cc (unified_table): Delete.
	(range_op_table operator_table): Instantiate.
	(range_op_table::range_op_table): Rename from unified_table.
	(range_op_handler::range_op_handler): Use range_op_table.
	* range-op.h (range_op_table::operator []): Inline.
	(range_op_table::set): Inline.
2023-06-12 10:51:03 -04:00
Andrew MacLeod
2eb50117ca Remove type from range_op_handler table selection
With the unified table complete, we no loonger need to specify a type
to choose a table when setting a range_op_handler.

	* gimple-range-gori.cc (gori_compute::condexpr_adjust): Do not
	pass type.
	* gimple-range-op.cc (get_code): Rename from get_code_and_type
	and simplify.
	(gimple_range_op_handler::supported_p): No need for type.
	(gimple_range_op_handler::gimple_range_op_handler): Ditto.
	(cfn_copysign::fold_range): Ditto.
	(cfn_ubsan::fold_range): Ditto.
	* ipa-cp.cc (ipa_vr_operation_and_type_effects): Ditto.
	* ipa-fnsummary.cc (evaluate_conditions_for_known_args): Ditto.
	* range-op-float.cc (operator_plus::op1_range): Ditto.
	(operator_mult::op1_range): Ditto.
	(range_op_float_tests): Ditto.
	* range-op.cc (get_op_handler): Remove.
	(range_op_handler::set_op_handler): Remove.
	(operator_plus::op1_range): No need for type.
	(operator_minus::op1_range): Ditto.
	(operator_mult::op1_range): Ditto.
	(operator_exact_divide::op1_range): Ditto.
	(operator_cast::op1_range): Ditto.
	(perator_bitwise_not::fold_range): Ditto.
	(operator_negate::fold_range): Ditto.
	* range-op.h (range_op_handler::range_op_handler): Remove type param.
	(range_cast): No need for type.
	(range_op_table::operator[]): Check for enum_code >= 0.
	* tree-data-ref.cc (compute_distributive_range): No need for type.
	* tree-ssa-loop-unswitch.cc (unswitch_predicate): Ditto.
	* value-query.cc (range_query::get_tree_range): Ditto.
	* value-relation.cc (relation_oracle::validate_relation): Ditto.
	* vr-values.cc (range_of_var_in_loop): Ditto.
	(simplify_using_ranges::fold_cond_with_ops): Ditto.
2023-06-12 10:50:30 -04:00
Andrew MacLeod
110c1f8d30 Add a hybrid MAX_EXPR operator for integer and pointer.
This adds an operator to the unified table for MAX_EXPR which will
select either the pointer or integer version based on the type passed
to the method.   This is for use until we have a seperate PRANGE class.

THIs also removes the pointer table which is no longer needed.

	* range-op-mixed.h (operator_max): Remove final.
	* range-op-ptr.cc (pointer_table::pointer_table): Remove MAX_EXPR.
	(pointer_table::pointer_table): Remove.
	(class hybrid_max_operator): New.
	(range_op_table::initialize_pointer_ops): Add hybrid_max_operator.
	* range-op.cc (pointer_tree_table): Remove.
	(unified_table::unified_table): Comment out MAX_EXPR.
	(get_op_handler): Remove check of pointer table.
	* range-op.h (class pointer_table): Remove.
2023-06-12 10:48:42 -04:00
Andrew MacLeod
73cbf402d3 Add a hybrid MIN_EXPR operator for integer and pointer.
This adds an operator to the unified table for MIN_EXPR which will
select either the pointer or integer version based on the type passed
to the method.   This is for use until we have a seperate PRANGE class.

	* range-op-mixed.h (operator_min): Remove final.
	* range-op-ptr.cc (pointer_table::pointer_table): Remove MIN_EXPR.
	(class hybrid_min_operator): New.
	(range_op_table::initialize_pointer_ops): Add hybrid_min_operator.
	* range-op.cc (unified_table::unified_table): Comment out MIN_EXPR.
2023-06-12 10:48:40 -04:00
Andrew MacLeod
af5e7f0629 Add a hybrid BIT_IOR_EXPR operator for integer and pointer.
This adds an operator to the unified table for BIT_IOR_EXPR which will
select either the pointer or integer version based on the type passed
to the method.   This is for use until we have a seperate PRANGE class.

	* range-op-mixed.h (operator_bitwise_or): Remove final.
	* range-op-ptr.cc (pointer_table::pointer_table): Remove BIT_IOR_EXPR.
	(class hybrid_or_operator): New.
	(range_op_table::initialize_pointer_ops): Add hybrid_or_operator.
	* range-op.cc (unified_table::unified_table): Comment out BIT_IOR_EXPR.
2023-06-12 10:48:38 -04:00
Andrew MacLeod
8e0f292f92 Add a hybrid BIT_AND_EXPR operator for integer and pointer.
This adds an operator to the unified table for BIT_AND_EXPR which will
select either the pointer or integer version based on the type passed
to the method.   This is for use until we have a seperate PRANGE class.

	* range-op-mixed.h (operator_bitwise_and): Remove final.
	* range-op-ptr.cc (pointer_table::pointer_table): Remove BIT_AND_EXPR.
	(class hybrid_and_operator): New.
	(range_op_table::initialize_pointer_ops): Add hybrid_and_operator.
	* range-op.cc (unified_table::unified_table): Comment out BIT_AND_EXPR.
2023-06-12 10:48:31 -04:00
Andrew MacLeod
f6e160e35a Split pointer ibased range operators to range-op-ptr.cc
MOve the pointer table and all pointer specific operators into a
new file for pointers.

	* Makefile.in (OBJS): Add range-op-ptr.o.
	* range-op-mixed.h (update_known_bitmask): Move prototype here.
	(minus_op1_op2_relation_effect): Move prototype here.
	(wi_includes_zero_p): Move function to here.
	(wi_zero_p): Ditto.
	* range-op.cc (update_known_bitmask): Remove static.
	(wi_includes_zero_p): Move to header.
	(wi_zero_p): Move to header.
	(minus_op1_op2_relation_effect): Remove static.
	(operator_pointer_diff): Move class and routines to range-op-ptr.cc.
	(pointer_plus_operator): Ditto.
	(pointer_min_max_operator): Ditto.
	(pointer_and_operator): Ditto.
	(pointer_or_operator): Ditto.
	(pointer_table): Ditto.
	(range_op_table::initialize_pointer_ops): Ditto.
	* range-op-ptr.cc: New.
2023-06-12 10:48:31 -04:00
Andrew MacLeod
f0278eb04f Move operator_max to the unified range-op table.
Also remove the integral table.

	* range-op-mixed.h (class operator_max): Move from...
	* range-op.cc (unified_table::unified_table): Add MAX_EXPR.
	(get_op_handler): Remove the integral table.
	(class operator_max): Move from here.
	(integral_table::integral_table): Delete.
	* range-op.h (class integral_table): Delete.
2023-06-12 10:48:30 -04:00
Andrew MacLeod
b08b98254a Move operator_min to the unified range-op table.
* range-op-mixed.h (class operator_min): Move from...
	* range-op.cc (unified_table::unified_table): Add MIN_EXPR.
	(class operator_min): Move from here.
	(integral_table::integral_table): Remove MIN_EXPR.
2023-06-12 10:48:30 -04:00
Andrew MacLeod
b23d6b957f Move operator_bitwise_or to the unified range-op table.
* range-op-mixed.h (class operator_bitwise_or): Move from...
	* range-op.cc (unified_table::unified_table): Add BIT_IOR_EXPR.
	(class operator_bitwise_or): Move from here.
	(integral_table::integral_table): Remove BIT_IOR_EXPR.
2023-06-12 10:48:30 -04:00
Andrew MacLeod
0965275e86 Move operator_bitwise_and to the unified range-op table.
At this point, the remaining 4 integral operation have different
impllementations than pointers, so we now check for a pointer table
entry first, then if there is nothing, look at the Unified table.

	* range-op-mixed.h (class operator_bitwise_and): Move from...
	* range-op.cc (unified_table::unified_table): Add BIT_AND_EXPR.
	(get_op_handler): Check for a pointer table entry first.
	(class operator_bitwise_and): Move from here.
	(integral_table::integral_table): Remove BIT_AND_EXPR.
2023-06-12 10:48:30 -04:00
Andrew MacLeod
af52b86297 Move operator_bitwise_xor to the unified range-op table.
* range-op-mixed.h (class operator_bitwise_xor): Move from...
	* range-op.cc (unified_table::unified_table): Add BIT_XOR_EXPR.
	(class operator_bitwise_xor): Move from here.
	(integral_table::integral_table): Remove BIT_XOR_EXPR.
	(pointer_table::pointer_table): Remove BIT_XOR_EXPR.
2023-06-12 10:48:30 -04:00
Andrew MacLeod
39636a09da Move operator_bitwise_not to the unified range-op table.
* range-op-mixed.h (class operator_bitwise_not): Move from...
	* range-op.cc (unified_table::unified_table): Add BIT_NOT_EXPR.
	(class operator_bitwise_not): Move from here.
	(integral_table::integral_table): Remove BIT_NOT_EXPR.
	(pointer_table::pointer_table): Remove BIT_NOT_EXPR.
2023-06-12 10:48:29 -04:00
Andrew MacLeod
443485b343 Move operator_addr_expr to the unified range-op table.
* range-op-mixed.h (class operator_addr_expr): Move from...
	* range-op.cc (unified_table::unified_table): Add ADDR_EXPR.
	(class operator_addr_expr): Move from here.
	(integral_table::integral_table): Remove ADDR_EXPR.
	(pointer_table::pointer_table): Remove ADDR_EXPR.
2023-06-12 10:48:29 -04:00
Gaius Mulley
bf47089590 PR modula2/110126 variables are reported as unused when referenced by ASM fix
This patch fixes the trash list of the asm statement.  It introduces a
separate build procedure for trashed elements.

gcc/m2/ChangeLog:

	PR modula2/110126
	* gm2-compiler/M2Quads.def (BuildAsmElement): Remove
	trash parameter.
	(BuildAsmTrash): New procedure.
	* gm2-compiler/M2Quads.mod (BuildAsmTrash): New procedure.
	(BuildAsmElement): Remove trash parameter.
	* gm2-compiler/P3Build.bnf (AsmTrashList): Rewrite.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
2023-06-12 15:25:39 +01:00
Pan Li
7a4794af9a RISC-V: Fix one potential test failure for RVV vsetvl
The test will fail on below command with multi-thread like below.  However,
it comes from one missed "Oz" option when check vsetvl.

make -j $(nproc) report RUNTESTFLAGS="rvv.exp riscv.exp"

To some reason, this failure cannot be reproduced by RUNTESTFLAGS="rvv.exp"
or make without -j option. We would like to fix it and root cause the
reason later.

Signed-off-by: Pan Li <pan2.li@intel.com>

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/vsetvl/vsetvl-23.c: Adjust test checking.
2023-06-12 22:11:18 +08:00
Pan Li
145b5db151 RISC-V: Support RVV FP16 MISC vget/vset intrinsic API
This patch support the intrinsic API of FP16 ZVFHMIN vget/vset. From
the user's perspective, it is reasonable to do some get/set operations
for the vfloat16*_t types when only ZVFHMIN is enabled.

Signed-off-by: Pan Li <pan2.li@intel.com>

gcc/ChangeLog:

	* config/riscv/riscv-vector-builtins-types.def
	(vfloat16m1_t): Add type to lmul1 ops.
	(vfloat16m2_t): Likewise.
	(vfloat16m4_t): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c: Add new test cases.
	* gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: Likewise.
2023-06-12 22:10:22 +08:00
Richard Biener
8d3eb3ad53 Fix disambiguation against .MASK_STORE
Alias analysis was treating .MASK_STORE as storing a full vector
which means we disambiguate against decls of smaller than vector size.
That's of course wrong and a similar issue was fixed for DSE already.
The following makes sure we set the size of the access to unknown
and only constrain max_size.

This fixes runtime execution FAILs of gfortran.dg/matmul_2.f90,
gfortran.dg/matmul_6.f90 and gfortran.dg/pr91577.f90 when using
AVX512 with full masked loop vectorization on Zen4.

	* tree-ssa-alias.cc (call_may_clobber_ref_p_1): For
	.MASK_STORE and friend set the size of the access to
	unknown.
2023-06-12 15:19:24 +02:00
Tamar Christina
bc45e18d43 Remove DEFAULT_MATCHPD_PARTITIONS macro
As Jakub pointed out, DEFAULT_MATCHPD_PARTITIONS
is now unused and can be removed.

gcc/ChangeLog:

	* config.in: Regenerate.
	* configure: Regenerate.
	* configure.ac: Remove DEFAULT_MATCHPD_PARTITIONS.
2023-06-12 14:06:08 +01:00
Juzhe-Zhong
6631fe419c RISC-V: Add RVV narrow shift right lowering auto-vectorization
Optimize the following auto-vectorization codes:
void foo (int16_t * __restrict a, int32_t * __restrict b, int32_t c, int n)
{
    for (int i = 0; i < n; i++)
      a[i] = b[i] >> c;
}

Before this patch:
foo:
        ble     a3,zero,.L5
.L3:
        vsetvli a5,a3,e32,m1,ta,ma
        vle32.v v1,0(a1)
        vsetvli a4,zero,e32,m1,ta,ma
        vsra.vx v1,v1,a2
        vsetvli zero,zero,e16,mf2,ta,ma
        slli    a7,a5,2
        vncvt.x.x.w     v1,v1
        slli    a6,a5,1
        vsetvli zero,a5,e16,mf2,ta,ma
        sub     a3,a3,a5
        vse16.v v1,0(a0)
        add     a1,a1,a7
        add     a0,a0,a6
        bne     a3,zero,.L3
.L5:
        ret

After this patch:
foo:
	ble	a3,zero,.L5
.L3:
	vsetvli	a5,a3,e32,m1,ta,ma
	vle32.v	v1,0(a1)
	vsetvli	a7,zero,e16,mf2,ta,ma
	slli	a6,a5,2
	vnsra.wx	v1,v1,a2
	slli	a4,a5,1
	vsetvli	zero,a5,e16,mf2,ta,ma
	sub	a3,a3,a5
	vse16.v	v1,0(a0)
	add	a1,a1,a6
	add	a0,a0,a4
	bne	a3,zero,.L3
.L5:
	ret

gcc/ChangeLog:

	* config/riscv/autovec-opt.md
	(*v<any_shiftrt:optab><any_extend:optab>trunc<mode>): New pattern.
	(*<any_shiftrt:optab>trunc<mode>): Ditto.
	* config/riscv/autovec.md (<optab><mode>3): Change to
	define_insn_and_split.
	(v<optab><mode>3): Ditto.
	(trunc<mode><v_double_trunc>2): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/binop/narrow-1.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/narrow-2.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/narrow-3.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/narrow_run-1.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/narrow_run-2.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/narrow_run-3.c: New test.
2023-06-12 20:54:50 +08:00
Kyrylo Tkachov
921b841350 simplify-rtx: Implement constant folding of SS_TRUNCATE, US_TRUNCATE
This patch implements RTL constant-folding for the SS_TRUNCATE and US_TRUNCATE codes.
The semantics are a clamping operation on the argument with the min and max of the narrow mode,
followed by a truncation. The signedness of the clamp and the min/max extrema is derived from
the signedness of the saturating operation.

We have a number of instructions in aarch64 that use SS_TRUNCATE and US_TRUNCATE to represent
their operations and we have pretty thorough runtime tests in gcc.target/aarch64/advsimd-intrinsics/vqmovn*.c.
With this patch the instructions are folded away at optimisation levels and the correctness checks still
pass.

Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.

gcc/ChangeLog:

	* simplify-rtx.cc (simplify_const_unary_operation):
	Handle US_TRUNCATE, SS_TRUNCATE.
2023-06-12 13:25:45 +01:00
Juzhe-Zhong
84cbf560ff RISC-V: Add ZVFHMIN block autovec testcase
To be safe, add ZVFHMIN autovec block testcase to make sure
we won't enable autovec in ZVFHMIN by mistakes.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/zvfhmin-1.c: New test.
2023-06-12 20:23:13 +08:00
Eric Botcazou
e8d41e031b Fix oversight in latest change
gcc/
	PR modula2/109952
	* doc/gm2.texi (Standard procedures): Fix Next link.
2023-06-12 13:25:12 +02:00
Tamar Christina
7103155a93 Regenerate config.in
Looks like I forgot to regenerate config.in which
causes updates when you enable maintainer mode.

gcc/ChangeLog:

	* config.in: Regenerate.
2023-06-12 11:22:45 +01:00
Andre Vieira
3ad0ef34cc vect: Don't pass subtype to vect_widened_op_tree where not needed [PR 110142]
This patch fixes an issue introduced by
g:2f482a07365d9f4a94a56edd13b7f01b8f78b5a0, where a subtype was beeing passed
to vect_widened_op_tree, when no subtype was to be used. This lead to an
errorneous use of IFN_VEC_WIDEN_MINUS.

gcc/ChangeLog:

	PR middle-end/110142
	* tree-vect-patterns.cc (vect_recog_widen_op_pattern): Don't pass
	subtype to vect_widened_op_tree and remove subtype parameter, also
	remove superfluous overloaded function definition.
	(vect_recog_widen_plus_pattern): Remove subtype parameter and dont pass
	to call to vect_recog_widen_op_pattern.
	(vect_recog_widen_minus_pattern): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/pr110142.c: New test.
2023-06-12 10:34:10 +01:00
liuhongt
e52be6034f Add missing vec_pack/unpacks patterns for _Float16 <-> int/float conversion.
This patch only support optabs for vector modes whose lenth >= 128.
For 32/64-bit vector, they're more hanlded by BB vectorizer with
truncmn2/extendmn2/fix{,uns}_truncmn2.

gcc/ChangeLog:

	* config/i386/sse.md (vec_pack<floatprefix>_float_<mode>): New expander.
	(vec_unpack_<fixprefix>fix_trunc_lo_<mode>): Ditto.
	(vec_unpack_<fixprefix>fix_trunc_hi_<mode>): Ditto.
	(vec_unpacks_lo_<mode>): Ditto.
	(vec_unpacks_hi_<mode>): Ditto.
	(sse_movlhps_<mode>): New define_insn.
	(ssse3_palignr<mode>_perm): Extend to V_128H.
	(V_128H): New mode iterator.
	(ssepackPHmode): New mode attribute.
	(vunpck_extract_mode): Ditto.
	(vpckfloat_concat_mode): Extend to VxSI/VxSF for _Float16.
	(vpckfloat_temp_mode): Ditto.
	(vpckfloat_op_mode): Ditto.
	(vunpckfixt_mode): Extend to VxHF.
	(vunpckfixt_model): Ditto.
	(vunpckfixt_extract_mode): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/vec_pack_fp16-1.c: New test.
	* gcc.target/i386/vec_pack_fp16-2.c: New test.
	* gcc.target/i386/vec_pack_fp16-3.c: New test.
2023-06-12 17:15:08 +08:00
Richard Biener
820d1aec89 middle-end/110200 - genmatch force-leaf and convert interaction
The following fixes code GENERIC generation for (convert! ...)
which currently generates

  if (TREE_TYPE (_o1[0]) != type)
    _r1 = fold_build1_loc (loc, NOP_EXPR, type, _o1[0]);
    if (EXPR_P (_r1))
      goto next_after_fail867;
  else
    _r1 = _o1[0];

where obviously braces are missing.

	PR middle-end/110200
	* genmatch.cc (expr::gen_transform): Put braces around
	the if arm for the (convert ...) short-cut.
2023-06-12 11:01:26 +02:00
Jason Merrill
2764335bd3 c++: build initializer_list<string> in a loop [PR105838]
I previously applied this change in r13-4565 but reverted it due to
PR108071.  That PR was then fixed by r13-4712, but I didn't re-apply this
change then because we weren't making the array static; since r14-1500 for
PR110070 we now make the initializer array static, so let's bring this back.

In situations where the maybe_init_list_as_range optimization isn't viable,
we can build an initializer_list<string> with a loop over a constant array
of string literals.

This is represented using a VEC_INIT_EXPR, which required adjusting a couple
of places that expected the initializer array to have the same type as the
target array and fixing build_vec_init not to undo our efforts.

	PR c++/105838

gcc/cp/ChangeLog:

	* call.cc (convert_like_internal) [ck_list]: Use
	maybe_init_list_as_array.
	* constexpr.cc (cxx_eval_vec_init_1): Init might have
	a different type.
	* tree.cc (build_vec_init_elt): Likewise.
	* init.cc (build_vec_init): Handle from_array from a
	TARGET_EXPR.  Retain TARGET_EXPR of a different type.

gcc/testsuite/ChangeLog:

	* g++.dg/tree-ssa/initlist-opt5.C: New test.
2023-06-12 04:56:27 -04:00
Kewen Lin
ff83d1b47a rs6000: Guard __builtin_{un,}pack_vector_int128 with vsx [PR109932]
As PR109932 shows, builtins __builtin_{un,}pack_vector_int128
should be guarded under vsx rather than power7, as their
corresponding bif patterns have the conditions TARGET_VSX
and VECTOR_MEM_ALTIVEC_OR_VSX_P (V1TImode).  This patch is to
move __builtin_{un,}pack_vector_int128 to stanza vsx to ensure
their supports.

	PR target/109932

gcc/ChangeLog:

	* config/rs6000/rs6000-builtins.def (__builtin_pack_vector_int128,
	__builtin_unpack_vector_int128): Move from stanza power7 to vsx.

gcc/testsuite/ChangeLog:

	* gcc.target/powerpc/pr109932-1.c: New test.
	* gcc.target/powerpc/pr109932-2.c: New test.
2023-06-12 01:08:22 -05:00
Kewen Lin
388809f2af rs6000: Don't use TFmode for 128 bits fp constant in toc [PR110011]
As PR110011 shows, when encoding 128 bits fp constant into
toc, we adopts REAL_VALUE_TO_TARGET_LONG_DOUBLE which is
to find the first float mode with LONG_DOUBLE_TYPE_SIZE
bits of precision, it would be TFmode here.  But the 128
bits fp constant can be with mode IFmode or KFmode, which
doesn't necessarily have the same underlying float format
as the one of TFmode, like this PR exposes, with option
-mabi=ibmlongdouble TFmode has ibm_extended_format while
KFmode has ieee_quad_format, mixing up the formats (the
encoding/decoding ways) would cause unexpected results.

This patch is to make it use constant's own mode instead
of TFmode for real_to_target call.

	PR target/110011

gcc/ChangeLog:

	* config/rs6000/rs6000.cc (output_toc): Use the mode of the 128-bit
	floating constant itself for real_to_target call.

gcc/testsuite/ChangeLog:

	* gcc.target/powerpc/pr110011.c: New test.
2023-06-12 01:07:52 -05:00
Pan Li
b50b9d369c RISC-V: Add test cases for RVV FP16 undefined and vlmul trunc
This patch would like to add more tests for RVV FP16 undef and vlmul
trunc, aka

__riscv_vundefined_f16*();
__riscv_vlmul_trunc_v_f16*_f16*();

From the user's perspective, it is reasonable to do above operation
when only ZVFHMIN is enabled. This patch would like to add new test
cases to make sure the RVV FP16 vreinterpret works well as expected.

Signed-off-by: Pan Li <pan2.li@intel.com>

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c: Add test cases.
	* gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: Ditto.
2023-06-12 11:34:49 +08:00
Pan Li
7fc2b9ea2c RISC-V: Support RVV FP16 MISC vlmul ext intrinsic API
This patch support the intrinsic API of FP16 ZVFHMIN vlmul ext. Aka:

vfloat16*_t <==> vfloat16*_t.

From the user's perspective, it is reasonable to do some type convert
between vfloat16*_t and vfloat16*_t when only ZVFHMIN is enabled.

Signed-off-by: Pan Li <pan2.li@intel.com>

gcc/ChangeLog:

	* config/riscv/riscv-vector-builtins-types.def
	(vfloat16mf4_t): Add type to X2/X4/X8/X16/X32 vlmul ext ops.
	(vfloat16mf2_t): Ditto.
	(vfloat16m1_t): Ditto.
	(vfloat16m2_t): Ditto.
	(vfloat16m4_t): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c: Add new test cases.
	* gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: Add new test cases.
2023-06-12 11:34:17 +08:00
David Edelsohn
f47ecca875 aix: Debugging does not require a stack frame.
The rs6000 port has allocated a stack frame when debugging is enabled
on AIX since the earliest versions of the port.  Apparently the
earliest versions of the debuggers for AIX had difficulty with stackless
frames.

Both AIX DBX and GDB support stackless frames on AIX, and IBM XLC,
OpenXL and LLVM for AIX do not generate an extraneous stack frame when
debugging is enabled.  This patch updates the rs6000 stack info function
to not set the stack frame flag when debugging is enabled for AIX.

gcc/ChangeLog:

	* config/rs6000/rs6000-logue.cc (rs6000_stack_info):
	Do not require a stack frame when debugging is enabled for AIX.

Signed-off-by: David Edelsohn <dje.gcc@gmail.com>
2023-06-11 21:28:31 -04:00
GCC Administrator
35e50a0eaa Daily bump. 2023-06-12 00:16:56 +00:00
Patrick Palka
59946a4c0c c++: unsynthesized defaulted constexpr fn [PR110122]
In this other testcase from PR110122, during regeneration of the generic
lambda with V=Bar{}, substitution followed by coerce_template_parms for
A<V>'s template argument naturally yields a copy of V in terms of Bar's
(implicitly) defaulted copy constructor.

This however happens inside a template context so although we introduced
a use of the copy constructor, mark_used didn't actually synthesize it,
which causes subsequent constant evaluation of the template argument to
fail with:

  nontype-class59.C: In instantiation of ‘void f() [with Bar V = Bar{Foo()}]’:
  nontype-class59.C:22:11:   required from here
  nontype-class59.C:18:18: error: ‘constexpr Bar::Bar(const Bar&)’ used before its definition

We already make sure to instantiate templated constexpr functions needed
for constant evaluation (as per P0859R0).  So this patch fixes this by
making us synthesize defaulted constexpr functions needed for constant
evaluation as well.

	PR c++/110122

gcc/cp/ChangeLog:

	* constexpr.cc (cxx_eval_call_expression): Synthesize defaulted
	functions needed for constant evaluation.
	(instantiate_cx_fn_r): Likewise.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp2a/nontype-class59.C: New test.
2023-06-11 11:27:10 -04:00
Patrick Palka
682d401a6b c++: extend lookup_template_class shortcut [PR110122]
Here when substituting the injected class name A during regeneration of
the lambda, we find ourselves in lookup_template_class for A<V> with
V=_ZTAXtl3BarEE (i.e. the template parameter object for Foo{}).  The call
to coerce_template_parms within then undesirably tries to make a copy of
this class NTTP argument, which fails because Foo is not copyable.  But it
seems clear that this testcase shouldn't require copyability of Foo.

lookup_template_class has a shortcut for looking up the current class
scope, which would avoid the problematic coerce_template_parms call, but
the shortcut doesn't trigger because it only considers the innermost
class scope which in this case in the lambda type.  So this patch fixes
this by extending the lookup_template_class shortcut to consider outer
class scopes too (and skipping over lambda types since they are never
specialized from lookup_template_class).  We also need to avoid calling
coerce_template_parms when specializing a templated non-template nested
class for the first time (such as A::B in the testcase).  Coercion should
be unnecessary there because the innermost arguments belong to the context
and so should have already been coerced.

	PR c++/110122

gcc/cp/ChangeLog:

	* pt.cc (lookup_template_class): Extend shortcut for looking up the
	current class scope to consider outer class scopes too, and use
	current_nonlambda_class_type instead of current_class_type.  Only
	call coerce_template_parms when specializing a primary template.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp2a/nontype-class57.C: New test.
	* g++.dg/cpp2a/nontype-class58.C: New test.
2023-06-11 11:09:16 -04:00
Francois-Xavier Coudert
ecc96eb5d2 libgfortran: remove support for --enable-intermodule
libgfortran/

	PR libfortran/109373
	* configure.ac: Remove support for --enable-intermodule
	* Makefile.am: Remove onestep path.
	* configure: Regenerate.
	* Makefile.in: Regenerate.
2023-06-11 15:34:07 +02:00
Georg-Johann Lay
3443d4ba04 Use canonical form for reversed single-bit insertions after reload.
We now split almost all insns after reload in order to add clobber of REG_CC.
If insns are coming from insn combiner and there is no canonical form for
the respective arithmetic (like for reversed bit insertions), there is
no need to keep all these different representations after reload:
Instead of splitting such patterns to their clobber-REG_CC-analogon, we can
split to a canonical representation, which is insv_notbit for the present case.
This is a no-op change.

gcc/
	* config/avr/avr.md (adjust_len) [insv_notbit_0, insv_notbit_7]:
	Remove attribute values.
	(insv_notbit): New post-reload insn.
	(*insv.not-shiftrt_split, *insv.xor1-bit.0_split)
	(*insv.not-bit.0_split, *insv.not-bit.7_split)
	(*insv.xor-extract_split): Split to insv_notbit.
	(*insv.not-shiftrt, *insv.xor1-bit.0, *insv.not-bit.0, *insv.not-bit.7)
	(*insv.xor-extract): Remove post-reload insns.
	* config/avr/avr.cc (avr_out_insert_notbit) [bitno]: Remove parameter.
	(avr_adjust_insn_length): Adjust call of avr_out_insert_notbit.
	[ADJUST_LEN_INSV_NOTBIT_0, ADJUST_LEN_INSV_NOTBIT_7]: Remove cases.
	* config/avr/avr-protos.h (avr_out_insert_notbit): Adjust prototype.
2023-06-11 13:54:14 +02:00