mirror/gcc - gcc - Alpaca Bi's Private Git Pepository

mirror/gcc

mirror of https://github.com/gcc-mirror/gcc.git synced 2024-11-21 13:40:47 +00:00

Author	SHA1	Message	Date
Andrew Pinski	2a2e678407	ifcombine: For short circuit case, allow 2 convert defining statements [PR85605] r0-126134-g5d2a9da9a7f7c1 added support for circuiting and combing the ifs into using either AND or OR. But it only allowed the inner condition basic block having the conditional only. This changes to allow up to 2 defining statements as long as they are just integer to integer conversions for either the lhs or rhs of the conditional. This should allow to use ccmp on aarch64 and x86_64 (APX) slightly more than before. Boootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/85605 gcc/ChangeLog: * tree-ssa-ifcombine.cc (can_combine_bbs_with_short_circuit): New function. (ifcombine_ifandif): Use can_combine_bbs_with_short_circuit instead of checking if iterator is one before the last statement. gcc/testsuite/ChangeLog: * g++.dg/tree-ssa/ifcombine-ccmp-1.C: New test. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-7.c: New test. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-8.c: New test. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-9.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>	2024-11-07 08:51:55 -08:00
Andrew Pinski	684e5ae90b	VN: Lookup `val != 0` if we got back val when looking up the predicate for GIMPLE_COND [PR117414] Sometimes we get back a full ssa name when looking up the comparison of the GIMPLE_COND rather than a predicate. We then want to lookup the `val != 0` for the predicate. Note this might happen with other boolean assignments and COND_EXPR but I am not sure if it is as important; I have not found a testcase yet. Bootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/117414 gcc/ChangeLog: * tree-ssa-sccvn.cc (process_bb): Lookup `val != 0` if got back a ssa name when looking the comparison. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/fre-predicated-4.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>	2024-11-07 08:24:52 -08:00
Andrew Pinski	add4bb9445	VN: Handle `(A CMP B) !=/== 0` for predicates [PR117414] After the last patch, we also want to record `(A CMP B) != 0` as `(A CMP B)` and `(A CMP B) == 0` as `(A CMP B)` with the true/false edges swapped. This shows up more due to the new handling of `(A \| B) ==/!= 0` in insert_predicates_for_cond as now we can notice these comparisons which were not seen before. This is enough to fix the original issue in `gcc.dg/tree-ssa/pr111456-1.c` and make sure we don't regress it when enhancing ifcombine. This adds that predicate and allows us to optimize f in fre-predicated-3.c. Changes since v1: * v2: Use vn_valueize. Bootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/117414 gcc/ChangeLog: * tree-ssa-sccvn.cc (insert_predicates_for_cond): Handle `(A CMP B) !=/== 0`. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/fre-predicated-3.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>	2024-11-07 08:24:52 -08:00
Andrew Pinski	5780028466	VN: Handle `(a \| b) !=/== 0` for predicates [PR117414] For `(a \| b) == 0`, we can "assert" on the true edge that both `a == 0` and `b == 0` but nothing on the false edge. For `(a \| b) != 0`, we can "assert" on the false edge that both `a == 0` and `b == 0` but nothing on the true edge. This adds that predicate and allows us to optimize f0, f1, and f2 in fre-predicated-[12].c. Changes since v1: * v2: Use vn_valueize. Also canonicalize the comparison at the begining of insert_predicates_for_cond for constants to be on the rhs. Return early for non-ssa names on the lhs (after canonicalization). Bootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/117414 gcc/ChangeLog: * tree-ssa-sccvn.cc (insert_predicates_for_cond): Canonicalize the comparison. Don't insert anything if lhs is not a SSA_NAME. Handle `(a \| b) !=/== 0`. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/fre-predicated-1.c: New test. * gcc.dg/tree-ssa/fre-predicated-2.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>	2024-11-07 08:24:51 -08:00
Andrew Pinski	b38f8294e4	VN: Factor out inserting predicates for conditional To make it easier to add more predicates in some cases, factor out the code. Plus it makes the code slightly more readable since it is not indented as much. Bootstrapped and tested on x86_64. gcc/ChangeLog: * tree-ssa-sccvn.cc (insert_predicates_for_cond): New function, factored out from ... (process_bb): Here. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>	2024-11-07 08:24:51 -08:00
Jonathan Wakely	6a6b8b847b	libstdc++: Tweak comments on includes in hashtable headers std::is_permutation is only used in <bits/hashtable.h> not in <bits/hashtable_policy.h>, so move the comment referring to it. libstdc++-v3/ChangeLog: * include/bits/hashtable.h: Add is_permutation to comment. * include/bits/hashtable_policy.h: Remove it from comment.	2024-11-07 15:47:07 +00:00
Jonathan Wakely	bcf40c70f8	libstdc++: Fix typo in comment in hashtable.h And tweak grammar in a couple of comments. libstdc++-v3/ChangeLog: * include/bits/hashtable.h: Fix spelling in comment.	2024-11-07 15:47:07 +00:00
Tobias Burnus	e52cfd4bc2	libgomp.texi: Document OpenMP's Interoperability Routines libgomp/ChangeLog: * libgomp.texi (OpenMP Technical Report 13): Remove 'iterator' in 'map' clause of 'declare mapper' as it is already the list above. (Interoperability Routines): Add. (omp_target_memcpy_async, omp_target_memcpy_rect_async): Document that depobj_list may be omitted in C++ and Fortran.	2024-11-07 16:15:28 +01:00
Paul Iannetta	06a725a6f7	Unify registered_pp_pragmas and registered_pragmas Until now, the structures that keep pragma information were different when in preprocessing only mode and in normal mode. This change unifies both so that the space and name of a pragma are always registered and can be queried easily at a later time. gcc/c-family/ChangeLog: * c-pragma.cc (struct pragma_pp_data): Use (struct internal_pragma_handler); (c_register_pragma_1): Always register name and space for all pragmas. (c_invoke_pragma_handler): Adapt. (c_invoke_early_pragma_handler): Likewise. (c_pp_invoke_early_pragma_handler): Likewise.	2024-11-07 16:13:55 +01:00
Richard Biener	7cd064a31d	Disable gather/scatter for non-first vectorized epilogue We currently make vect_check_gather_scatter happy by replacing SSA name references in DR_REF for gather/scatter DRs but the replacement process only works once since for the second epilogue we have SSA names from the first epilogue in DR_REF but as we copied from the original loop the SSA mapping doesn't work. The following simply punts for non-first epilogues, those gather/scatter recognized by patterns to IFNs are already analyzed and should work fine. * tree-vect-data-refs.cc (vect_check_gather_scatter): Refuse to analyze DR_REF if from an epilogue that's not first. * tree-vect-loop.cc (update_epilogue_loop_vinfo): Add comment how the substitution in DR_REF is broken.	2024-11-07 13:50:49 +01:00
Richard Biener	42d99f63cf	Add LOOP_VINFO_MAIN_LOOP_INFO The following introduces LOOP_VINFO_MAIN_LOOP_INFO alongside LOOP_VINFO_ORIG_LOOP_INFO so one can have both access to the main vectorized loop info and the preceeding vectorized epilogue. This is critical for correctness as we need to disallow never executed epilogues by costing in vect_analyze_loop_costing as we assume those do not exist when deciding to add a skip-vector edge during peeling. The patch also changes how multiple vector epilogues are handled - instead of the epilogue_vinfos array in the main loop info we now record the single epilogue_vinfo there and further epilogues in the epilogue_vinfo member of the epilogue info. This simplifies code. * tree-vectorizer.h (_loop_vec_info::main_loop_info): New. (LOOP_VINFO_MAIN_LOOP_INFO): Likewise. (_loop_vec_info::epilogue_vinfo): Change from epilogue_vinfos from array to single element. * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize main_loop_info and epilogue_vinfo. Remove epilogue_vinfos allocation. (_loop_vec_info::~_loop_vec_info): Do not release epilogue_vinfos. (vect_create_loop_vinfo): Rename parameter, set LOOP_VINFO_MAIN_LOOP_INFO. (vect_analyze_loop_1): Rename parameter. (vect_analyze_loop_costing): Properly distinguish between the main vector loop and the preceeding epilogue. (vect_analyze_loop): Change for epilogue_vinfos no longer being a vector. * tree-vect-loop-manip.cc (vect_do_peeling): Simplify and thereby handle a vector epilogue of a vector epilogue.	2024-11-07 13:50:49 +01:00
Richard Biener	2c25af0e52	Add LOOP_VINFO_DRS_ADVANCED_BY The following remembers how we advanced DRs when vectorizing an epilogue. When we want to vectorize the epilogue of such epilogue we have to retain that advancement and add the advancement for this vectorized epilogue. Due to the way we copy and re-associate stmt_vec_infos and DRs recording this advancement and re-applying it for the next epilogue is simplest. * tree-vectorizer.h (_loop_vec_info::drs_advanced_by): New. (LOOP_VINFO_DRS_ADVANCED_BY): Likewise. * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize drs_advanced_by. (update_epilogue_loop_vinfo): Remember the DR advancement made. (vect_transform_loop): Accumulate past advancements.	2024-11-07 13:50:19 +01:00
Richard Biener	76048bd069	Check LOOP_VINFO_PEELING_FOR_GAPS on epilog is supported We need to check that an epilogue doesn't require LOOP_VINFO_PEELING_FOR_GAPS in case the main loop didn't (the other way around is OK), the computation whether the epilog is executed or not gets our of sync otherwise. * tree-vect-loop.cc (vect_analyze_loop_2): Move vect_analyze_loop_costing after check whether we can do peeling. Add check on LOOP_VINFO_PEELING_FOR_GAPS for epilogues.	2024-11-07 13:49:08 +01:00
Jakub Jelinek	0dadf022de	testsuite: Fix up pr116725.c test [PR116725] On Fri, Oct 18, 2024 at 02:05:59PM -0400, Antoni Boucher wrote: > PR target/116725 > * gcc.target/i386/pr116725.c: Add test using those AVX builtins. This test FAILs for me, as I don't have the latest gas around and the test is dg-do assemble, so doesn't need just fixed compiler, but also assembler which supports those instructions. The following patch adds effective target directives to ensure assembler supports those too. 2024-11-07 Jakub Jelinek <jakub@redhat.com> PR target/116725 * gcc.target/i386/pr116725.c: Add dg-require-effective-target avx512{dq,fp16,vl}.	2024-11-07 13:20:20 +01:00
Andrew Stubbs	4e91d05872	openmp: Fix max_vf testcases with -march=cascadelake Apparently we need to explicitly disable AVX, not just enabled SSE, to guarentee the 16-lane vectors we need for the pattern match. libgomp/ChangeLog: * testsuite/libgomp.c/max_vf-1.c: Add -mno-avx. gcc/testsuite/ChangeLog: * gcc.dg/gomp/max_vf-1.c: Add -mno-avx.	2024-11-07 12:00:53 +00:00
Pan Li	16801e4957	Doc: Add doc for standard name mask_len_strided_load{store}m This patch would like to add doc for the below 2 standard names. 1. strided load: v = mask_len_strided_load (ptr, stried, mask, len, bias) 2. strided store: mask_len_stried_store (ptr, stride, v, mask, len, bias) gcc/ChangeLog: * doc/md.texi: Add doc for mask_len_stried_load{store}. Signed-off-by: Pan Li <pan2.li@intel.com> Co-Authored-By: Juzhe-Zhong <juzhe.zhong@rivai.ai>	2024-11-07 18:34:18 +08:00
Richard Biener	7a07de2c60	rtl-optimization/117467 - 33% compile-time in rest of compilation ext-dce uses TV_NONE, that's not OK for a pass taking 33% compile-time. The following adds a timevar to it for proper blaming. PR rtl-optimization/117467 * timevar.def (TV_EXT_DCE): New. * ext-dce.cc (pass_data_ext_dce): Use TV_EXT_DCE.	2024-11-07 11:05:07 +01:00
Hongyu Wang	50ecb6e960	i386: Support cstorebf4 with native bf16 comi We recently supports cbranchbf4 with AVX10_2 native bf16 comi instructions, so do similar to cstorebf4. gcc/ChangeLog: * config/i386/i386.md (cstorebf4): Use vcomsbf16 under TARGET_AVX10_2_256 and -fno-trapping-math. (cbranchbf4): Adjust formatting. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-comibf-3.c: New test. * gcc.target/i386/avx10_2-comibf-4.c: Likewise.	2024-11-07 16:23:05 +08:00
Hu, Lin1	4473cf8409	i386: Modify regexp of pr117304-1.c Since the test doesn't care if the hint is correct, modify the regexp of the hint part to avoid future changes to the hint that would cause the test to fail. gcc/testsuite/ChangeLog: * gcc.target/i386/pr117304-1.c: Modify regexp.	2024-11-07 14:41:13 +08:00
Alexandre Oliva	c2d58f88c1	limit ifcombine stmt moving and adjust flow info It became apparent that conditions could be combined that had deep SSA dependency trees, that might thus require moving lots of statements. Set a hard upper bound for now, hopefully to be replaced by a dynamically computed bound, based on probabilities and costs. Also reset flow sensitive info and avoid introducing undefined behavior when moving stmts from under guarding conditions. Finally, rework the preexisting reset of flow sensitive info and avoidance of undefined behavior to be done when needed on all affected inner blocks: reset flow info whenever enclosing conditions change, and avoid undefined behavior whenever enclosing conditions become laxer. for gcc/ChangeLog * tree-ssa-ifcombine.cc (ifcombine_rewrite_to_defined_overflow): New. (ifcombine_replace_cond): Reject conds that would require moving too many stmts. Reset flow sensitive info and avoid undefined behavior in moved stmts. Reset flow sensitive info in all inner blocks when the outer condition changes, and avoid undefined behavior whenever the outer condition becomes laxer, adapted and moved from... (pass_tree_ifcombine::execute): ... here.	2024-11-07 02:47:50 -03:00
Alexandre Oliva	13cf22eb55	handle TRUTH_ANDIF cond exprs in ifcombine_replace_cond The upcoming move of fold_truth_andor to ifcombine brings with it the possibility of TRUTH_ANDIF cond exprs. Handle them by splitting the cond so as to best use both BB insertion points, but only if they're contiguous. for gcc/ChangeLog * tree-ssa-ifcombine.c (ifcombine_replace_cond): Support TRUTH_ANDIF cond exprs.	2024-11-07 02:47:46 -03:00
Alexandre Oliva	ae074c69fd	ifcombine across noncontiguous blocks Rework ifcombine to support merging conditions from noncontiguous blocks. This depends on earlier preparation changes. The function that attempted to ifcombine a block with its immediate predecessor, tree_ssa_ifcombine_bb, now loops over dominating blocks eligible for ifcombine, attempting to combine with them. The function that actually drives the combination of a pair of blocks, tree_ssa_ifcombine_bb_1, now takes an additional parameter: the successor of outer that leads to inner. The function that recognizes if_then_else patterns is modified to enable testing without distinguishing between then and else, or to require nondegenerate conditions, that aren't worth combining with. for gcc/ChangeLog * tree-ssa-ifcombine.cc (recognize_if_then_else): Support relaxed then/else testing; require nondegenerate condition otherwise. (tree_ssa_ifcombine_bb_1): Add outer_succ_bb parm, use it instead of inner_cond_bb. Adjust callers. (tree_ssa_ifcombine_bb): Loop over dominating outer blocks eligible for ifcombine. (pass_tree_ifcombine::execute): Noted potential need for changes to the post-combine logic.	2024-11-07 02:47:42 -03:00
Alexandre Oliva	6eac478619	extend ifcombine_replace_cond to handle noncontiguous ifcombine Prepare to handle noncontiguous ifcombine, introducing logic to modify the outer condition when needed. There are two cases worth mentioning: - when blocks are noncontiguous, we have to place the combined condition in the outer block to avoid pessimizing carefully crafted short-circuited tests; - even when blocks are contiguous, we prepare for situations in which the combined condition has two tests, one to be placed in outer and the other in inner. This circumstance will not come up when noncontiguous ifcombine is first enabled, but it will when an improved fold_truth_andor is integrated with ifcombine. Combining the condition from inner into outer may require moving SSA DEFs used in the inner condition, and the changes implement this as well. for gcc/ChangeLog * tree-ssa-ifcombine.cc: Include bitmap.h. (ifcombine_mark_ssa_name): New. (struct ifcombine_mark_ssa_name_t): New. (ifcombine_mark_ssa_name_walk): New. (ifcombine_replace_cond): Prepare to handle noncontiguous and split-condition ifcombine.	2024-11-07 02:47:38 -03:00
Alexandre Oliva	02dc5036ba	adjust update_profile_after_ifcombine for noncontiguous ifcombine Prepare for ifcombining noncontiguous blocks, adding (still unused) logic to the ifcombine profile updater to handle such cases. for gcc/ChangeLog * tree-ssa-ifcombine.cc (known_succ_p): New. (update_profile_after_ifcombine): Handle noncontiguous blocks.	2024-11-07 02:47:34 -03:00
Alexandre Oliva	f9fb8f96cd	introduce ifcombine_replace_cond Refactor ifcombine_ifandif, moving the common code from the various paths that apply the combined condition to a new function. for gcc/ChangeLog * tree-ssa-ifcombine.cc (ifcombine_replace_cond): Factor out of... (ifcombine_ifandif): ... this. Leave it for the above to gimplify and invert the condition.	2024-11-07 02:47:31 -03:00
Alexandre Oliva	77c925464e	drop redundant ifcombine_ifandif parm In preparation to changes that may modify both inner and outer conditions in ifcombine, drop the redundant parameter result_inv, that is always identical to inner_inv. for gcc/ChangeLog * tree-ssa-ifcombine.cc (ifcombine_ifandif): Drop redundant result_inv parm. Adjust all callers.	2024-11-07 02:47:19 -03:00
Alexandre Oliva	8e6a25b01b	allow vuses in ifcombine blocks Disallowing vuses in blocks for ifcombine is too strict, and it prevents usefully moving fold_truth_andor into ifcombine. That tree-level folder has long ifcombined loads, absent other relevant side effects. for gcc/ChangeLog * tree-ssa-ifcombine.c (bb_no_side_effects_p): Allow vuses, but not vdefs.	2024-11-07 02:47:15 -03:00
Alexandre Oliva	2ec80c60d4	[testsuite] disable PIE on ia32 on more tests Multiple tests fail on ia32 with -fPIE enabled by default because of different call sequences required by the call-saved PIC register (no-callee-saved-.c), uses of the constant pool instead of computing constants (pr100865-.c), and unexpected matches of esp in get_pc_thunk (sse2-stv-1.c). Disable PIE on them, to match the expectations. for gcc/testsuite/ChangeLog * gcc.target/i386/no-callee-saved-13.c: Disable PIE on ia32. * gcc.target/i386/no-callee-saved-14.c: Likewise. * gcc.target/i386/no-callee-saved-15.c: Likewise. * gcc.target/i386/no-callee-saved-17.c: Likewise. * gcc.target/i386/pr100865-1.c: Likewise. * gcc.target/i386/pr100865-7a.c: Likewise. * gcc.target/i386/pr100865-7c.c: Likewise. * gcc.target/i386/sse2-stv-1.c: Likewise.	2024-11-07 02:47:06 -03:00
Alexandre Oliva	d17a2e8bfc	[testsuite] fix pr70321.c PIC expectations When we select a non-bx get_pc_thunk, we get an extra mov to set up the PIC register before the abort call. Expect that mov or a get_pc_thunk.bx call. for gcc/testsuite/ChangeLog * gcc.target/i386/pr70321.c: Cope with non-bx get_pc_thunk.	2024-11-07 02:46:57 -03:00
xuli	1e2ae65a7f	RISC-V: Add testcases for signed imm SAT_ADD form1 This patch adds testcase for form1, as shown below: T __attribute__((noinline)) \ sat_s_add_imm_##T##_fmt_1##_##INDEX (T x) \ { \ T sum = (UT)x + (UT)IMM; \ return (x ^ IMM) < 0 \ ? sum \ : (sum ^ x) >= 0 \ ? sum \ : x < 0 ? MIN : MAX; \ } Passed the rv64gcv regression test. Signed-off-by: Li Xu <xuli1@eswincomputing.com> gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Support signed imm SAT_ADD form1. * gcc.target/riscv/sat_s_add_imm-1-1.c: New test. * gcc.target/riscv/sat_s_add_imm-1.c: New test. * gcc.target/riscv/sat_s_add_imm-2-1.c: New test. * gcc.target/riscv/sat_s_add_imm-2.c: New test. * gcc.target/riscv/sat_s_add_imm-3-1.c: New test. * gcc.target/riscv/sat_s_add_imm-3.c: New test. * gcc.target/riscv/sat_s_add_imm-4.c: New test. * gcc.target/riscv/sat_s_add_imm-run-1.c: New test. * gcc.target/riscv/sat_s_add_imm-run-2.c: New test. * gcc.target/riscv/sat_s_add_imm-run-3.c: New test. * gcc.target/riscv/sat_s_add_imm-run-4.c: New test.	2024-11-07 01:51:35 +00:00
xuli	da31786910	Match:Support signed imm SAT_ADD form1 This patch would like to support .SAT_ADD when one of the op is singed IMM. Form1: T __attribute__((noinline)) \ sat_s_add_imm_##T##_fmt_1##_##INDEX (T x) \ { \ T sum = (UT)x + (UT)IMM; \ return (x ^ IMM) < 0 \ ? sum \ : (sum ^ x) >= 0 \ ? sum \ : x < 0 ? MIN : MAX; \ } Take below form1 as example: DEF_SAT_S_ADD_IMM_FMT_1(0, int8_t, uint8_t, -10, INT8_MIN, INT8_MAX) Before this patch: __attribute__((noinline)) int8_t sat_s_add_imm_int8_t_fmt_1_0 (int8_t x) { int8_t sum; unsigned char x.0_1; unsigned char _2; signed char _4; int8_t _5; _Bool _9; signed char _10; signed char _11; signed char _12; signed char _14; signed char _16; <bb 2> [local count: 1073741824]: x.0_1 = (unsigned char) x_6(D); _2 = x.0_1 + 246; sum_7 = (int8_t) _2; _4 = x_6(D) ^ sum_7; _16 = x_6(D) ^ 9; _14 = _4 & _16; if (_14 < 0) goto <bb 3>; [41.00%] else goto <bb 4>; [59.00%] <bb 3> [local count: 259738147]: _9 = x_6(D) < 0; _10 = (signed char) _9; _11 = -_10; _12 = _11 ^ 127; <bb 4> [local count: 1073741824]: # _5 = PHI <sum_7(2), _12(3)> return _5; } After this patch: __attribute__((noinline)) int8_t sat_s_add_imm_int8_t_fmt_1_0 (int8_t x) { int8_t _5; <bb 2> [local count: 1073741824]: _5 = .SAT_ADD (x_6(D), -10); [tail call] return _5; } The below test suites are passed for this patch: 1. The rv64gcv fully regression tests. 2. The x86 bootstrap tests. 3. The x86 fully regression tests. Signed-off-by: Li Xu <xuli1@eswincomputing.com> gcc/ChangeLog: * match.pd: Add the form1 of signed imm .SAT_ADD matching. * tree-ssa-math-opts.cc (match_saturation_add): Add fold convert for const_int to the type of operand 0.	2024-11-07 01:51:34 +00:00
GCC Administrator	693b7700a7	Daily bump.	2024-11-07 00:18:14 +00:00
H.J. Lu	859ce74dc2	avx10_2-comibf-2.c: Require AVX10.2 support Since avx10_2-comibf-2.c is a run test, require AVX10.2 support. * gcc.target/i386/avx10_2-comibf-2.c: Require avx10_2 target. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>	2024-11-07 05:49:12 +08:00
Alexey Merzlyakov	69bd93c167	[PATCH v2] RISC-V: zero_extend(not) -> xor optimization [PR112398] This patch adds optimization of the following patterns: (zero_extend:M (subreg:N (not:O==M (X:Q==M)))) -> (xor:M (zero_extend:M (subreg:N (X:M)), mask)) ... where the mask is GET_MODE_MASK (N). For the cases when X:M doesn't have any non-zero bits outside of mode N, (zero_extend:M (subreg:N (X:M)) could be simplified to just (X:M) and whole optimization will be: (zero_extend:M (subreg:N (not:M (X:M)))) -> (xor:M (X:M, mask)) Patch targets to handle code patterns like: not a0,a0 andi a0,a0,0xff to be optimized to: xori a0,a0,255 Change was locally tested for x86_64 and AArch64 (as most common) and for RV-64 and MIPS-32 targets (as having an effect from this optimization): no regressions for all cases. PR rtl-optimization/112398 gcc/ChangeLog: * simplify-rtx.cc (simplify_context::simplify_unary_operation_1): Simplify ZERO_EXTEND (SUBREG (NOT X)) to XOR (X, GET_MODE_MASK(SUBREG)) when X doesn't have any non-zero bits outside of SUBREG mode. gcc/testsuite/ChangeLog: * gcc.target/riscv/pr112398.c: New test. Signed-off-by: Alexey Merzlyakov <alexey.merzlyakov@samsung.com>	2024-11-06 14:40:08 -07:00
Iain Sandoe	a91d5c27cd	Darwin: Fix a narrowing warning. cdtor_record needs to have an unsigned entry for the position in order to match with vec_safe_length. gcc/ChangeLog: * config/darwin.cc (cdtor_record): Make position unsigned. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>	2024-11-06 20:49:07 +00:00
Andrew Stubbs	345eb9b795	openmp: Fix signed/unsigned warning My previous patch broke things when building with Werror. gcc/ChangeLog: * omp-general.cc (omp_max_vf): Cast the constant to poly_uint64.	2024-11-06 17:50:00 +00:00
Andrew Stubbs	d334f729e5	openmp: Add testcases for omp_max_vf Ensure that the GOMP_MAX_VF does the right thing for explicit schedules, when offloading is enabled ("target" directives are present), and is inactive otherwise. libgomp/ChangeLog: * testsuite/libgomp.c/max_vf-1.c: New test. * testsuite/libgomp.c/max_vf-2.c: New test. gcc/testsuite/ChangeLog: * gcc.dg/gomp/max_vf-1.c: New test.	2024-11-06 16:26:24 +00:00
Andrew Stubbs	2a2e6e9894	openmp: Add IFN_GOMP_MAX_VF Delay omp_max_vf call until after the host and device compilers have diverged so that the max_vf value can be tuned exactly right on both variants. This change means that the ompdevlow pass must be enabled for functions that use OpenMP directives with both "simd" and "schedule" enabled. gcc/ChangeLog: * internal-fn.cc (expand_GOMP_MAX_VF): New function. * internal-fn.def (GOMP_MAX_VF): New internal function. * omp-expand.cc (omp_adjust_chunk_size): Emit IFN_GOMP_MAX_VF when called in offload context, otherwise assume host context. * omp-offload.cc (execute_omp_device_lower): Expand IFN_GOMP_MAX_VF.	2024-11-06 16:26:24 +00:00
Andrew Stubbs	896c6c2893	openmp: use offload max_vf for chunk_size The chunk size for SIMD loops should be right for the current device; too big allocates too much memory, too small is inefficient. Getting it wrong doesn't actually break anything though. This patch attempts to choose the optimal setting based on the context. Both host-fallback and device will get the same chunk size, but device performance is the most important in this case. gcc/ChangeLog: * omp-expand.cc (is_in_offload_region): New function. (omp_adjust_chunk_size): Add pass-through "offload" parameter. (get_ws_args_for): Likewise. (determine_parallel_type): Use is_in_offload_region to adjust call to get_ws_args_for. (expand_omp_for_generic): Likewise. (expand_omp_for_static_chunk): Likewise.	2024-11-06 16:26:24 +00:00
Andrew Stubbs	5c9de3df85	openmp: Tune omp_max_vf for offload targets If requested, return the vectorization factor appropriate for the offload device, if any. This change gives a significant speedup in the BabelStream "dot" benchmark on amdgcn. The omp_adjust_chunk_size usecase is set "false", for now, but I intend to change that in a follow-up patch. Note that NVPTX SIMT offload does not use this code-path. gcc/ChangeLog: * gimple-loop-versioning.cc (loop_versioning::loop_versioning): Set omp_max_vf to offload == false. * omp-expand.cc (omp_adjust_chunk_size): Likewise. * omp-general.cc (omp_max_vf): Add "offload" parameter, and detect amdgcn offload devices. * omp-general.h (omp_max_vf): Likewise. * omp-low.cc (lower_rec_simd_input_clauses): Pass offload state to omp_max_vf.	2024-11-06 16:26:24 +00:00
Andrew MacLeod	137b26412f	Add details output for assume processing. The Assume pass simply produces results, with no indication of how it arrived as the results it gets. Add some output to the details listing. The only functional change is when gori is used to calculate a range more than once (ie, multiple uses), we now load the merged range rather than just using the last calculated one. * tree-assume.cc (assume_query::assume_query): Add debug output. (assume_query::update_parms): Likewise. (assume_query::calculate_phi): Likewise. (assume_query::calculate_op): Likewise. Also pick up any merged path values. (assume_query::calculate_stmt): Likewise.	2024-11-06 08:51:21 -05:00
David Malcolm	85736ba8e1	testsuite: add infinite recursion test case [PR63388] gcc/testsuite/ChangeLog: PR c++/63388 * g++.dg/analyzer/infinite-recursion-pr63388.C: New test. Signed-off-by: David Malcolm <dmalcolm@redhat.com>	2024-11-06 08:45:29 -05:00
David Malcolm	6f4977ee54	diagnostics: fix typo in comment gcc/ChangeLog: * diagnostic.h (class diagnostic_context): Fix typo in leading comment. Signed-off-by: David Malcolm <dmalcolm@redhat.com>	2024-11-06 08:45:29 -05:00
Jonathan Wakely	5c34f02ba7	libstdc++: Deprecate useless <cxxx> compatibility headers for C++17 These headers make no sense for C++ programs, because they either define different content to the corresponding <xxx.h> C header, or define nothing at all in namespace std. They were all deprecated in C++17, so add deprecation warnings to them, which can be disabled with -Wno-deprecated. For C++20 and later these headers are no longer in the standard at all, so compiling with _GLIBCXX_USE_DEPRECATED defined to 0 will give an error when they are included. Because #warning is non-standard before C++23 we need to use pragmas to ignore -Wc++23-extensions for the -Wsystem-headers -pedantic case. One g++ test needs adjustment because it includes <ciso646>, but that can be made conditional on the __cplusplus value without any reduction in test coverage. For the library tests, consolidate the std_c++0x_neg.cc XFAIL tests into the macros.cc test, using dg-error with a { target c++98_only } selector. This avoids having two separate test files, one for C++98 and one for everything later. Also add tests for the <xxx.h> headers to ensure that they behave as expected and don't give deprecated warnings. libstdc++-v3/ChangeLog: * doc/xml/manual/evolution.xml: Document deprecations. * doc/html/: Regenerate. include/c_compatibility/complex.h (_GLIBCXX_COMPLEX_H): Move include guard to start of file. Include <complex> directly instead of <ccomplex>. * include/c_compatibility/tgmath.h: Include <cmath> and <complex> directly, instead of <ctgmath>. * include/c_global/ccomplex: Add deprecated #warning for C++17 and #error for C++20 if _GLIBCXX_USE_DEPRECATED == 0. * include/c_global/ciso646: Likewise. * include/c_global/cstdalign: Likewise. * include/c_global/cstdbool: Likewise. * include/c_global/ctgmath: Likewise. * include/c_std/ciso646: Likewise. * include/precompiled/stdc++.h: Do not include ccomplex, ciso646, cstdalign, cstdbool, or ctgmath in C++17 and later. * testsuite/18_support/headers/cstdalign/macros.cc: Check for warnings and errors for unsupported dialects. * testsuite/18_support/headers/cstdbool/macros.cc: Likewise. * testsuite/26_numerics/headers/ctgmath/complex.cc: Likewise. * testsuite/27_io/objects/char/1.cc: Do not include <ciso646>. * testsuite/27_io/objects/wchar_t/1.cc: Likewise. * testsuite/18_support/headers/cstdbool/std_c++0x_neg.cc: Removed. * testsuite/18_support/headers/cstdalign/std_c++0x_neg.cc: Removed. * testsuite/26_numerics/headers/ccomplex/std_c++0x_neg.cc: Removed. * testsuite/26_numerics/headers/ctgmath/std_c++0x_neg.cc: Removed. * testsuite/18_support/headers/ciso646/macros.cc: New test. * testsuite/18_support/headers/ciso646/macros.h.cc: New test. * testsuite/18_support/headers/cstdbool/macros.h.cc: New test. * testsuite/26_numerics/headers/ccomplex/complex.cc: New test. * testsuite/26_numerics/headers/ccomplex/complex.h.cc: New test. * testsuite/26_numerics/headers/ctgmath/complex.h.cc: New test. gcc/testsuite/ChangeLog: * g++.old-deja/g++.other/headers1.C: Do not include ciso646 for C++17 and later.	2024-11-06 12:47:19 +00:00
Jonathan Wakely	6a050a3e65	libstdc++: Move include guards to start of headers libstdc++-v3/ChangeLog: * include/c_compatibility/complex.h (_GLIBCXX_COMPLEX_H): Move include guard to start of the header. * include/c_global/ctgmath (_GLIBCXX_CTGMATH): Likewise.	2024-11-06 12:47:19 +00:00
Jonathan Wakely	1b169ee7e2	libstdc++: More user-friendly failed assertions from shared_ptr dereference Currently dereferencing an empty shared_ptr prints a complicated internal type in the assertion message: include/bits/shared_ptr_base.h:1377: std::__shared_ptr_access<_Tp, _Lp, <anonymous>, <anonymous> >::element_type& std::__shared_ptr_access<_Tp, _Lp, <anonymous>, <anonymous> >::operator() const [with _Tp = std::filesystem::__cxx11::recursive_directory_iterator::_Dir_stack; __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic; bool <anonymous> = false; bool <anonymous> = false; element_type = std::filesystem::__cxx11::recursive_directory_iterator::_Dir_stack]: Assertion '_M_get() != nullptr' failed. Users don't care about any of the _Lp and <anonymous> template parameters, so this is unnecessarily verbose. We can simplify it to something that only mentions "shared_ptr_deref" and the element type: include/bits/shared_ptr_base.h:1371: _Tp std::__shared_ptr_deref(_Tp) [with _Tp = filesystem::__cxx11::recursive_directory_iterator::_Dir_stack]: Assertion '__p != nullptr' failed. libstdc++-v3/ChangeLog: include/bits/shared_ptr_base.h (__shared_ptr_deref): New function template. (__shared_ptr_access, __shared_ptr_access<>): Use it.	2024-11-06 12:47:19 +00:00
Jonathan Wakely	f7979b8bfa	libstdc++: Enable debug assertions for filesystem directory iterators Several member functions of filesystem::directory_iterator and filesystem::recursive_directory_iterator currently dereference their shared_ptr data member without checking for non-null. Because they use operator-> and that function only uses _GLIBCXX_DEBUG_PEDASSERT rather than __glibcxx_assert there is no assertion even when the library is built with _GLIBCXX_ASSERTIONS defined. This means that dereferencing invalid directory iterators gives an unhelpful segfault. By using (p). instead of p-> we get an assertion when the library is built with _GLIBCXX_ASSERTIONS, with a "_M_get() != nullptr" message. libstdc++-v3/ChangeLog: src/c++17/fs_dir.cc (fs::directory_iterator::operator): Use shared_ptr::operator instead of shared_ptr::operator->. (fs::recursive_directory_iterator::options): Likewise. (fs::recursive_directory_iterator::depth): Likewise. (fs::recursive_directory_iterator::recursion_pending): Likewise. (fs::recursive_directory_iterator::operator*): Likewise. (fs::recursive_directory_iterator::disable_recursion_pending): Likewise.	2024-11-06 12:47:18 +00:00
Michal Jires	05e70ff921	ipcp don't propagate where not needed This patch disables propagation of ipcp information into partitions where all instances of the node are marked to be inlined. Motivation: Incremental LTO needs stable values between compilations to be effective. This requirement fails with following example: void heavily_used_function(int); ... heavily_used_function(__LINE__); Ipcp creates long list of all __LINE__ arguments, and then propagates it with every function clone, even though for inlined functions this information is not useful. gcc/ChangeLog: * ipa-prop.cc (write_ipcp_transformation_info): Disable uneeded value propagation. * lto-cgraph.cc (lto_symtab_encoder_encode): Default values. (lto_symtab_encoder_always_inlined_p): New. (lto_set_symtab_encoder_not_always_inlined): New. (add_node_to): Set always inlined. * lto-streamer.h (struct lto_encoder_entry): New field. (lto_symtab_encoder_always_inlined_p): New.	2024-11-06 13:40:59 +01:00
Jakub Jelinek	6d8764cc1f	store-merging: Apply --param=store-merging-max-size= in more spots [PR117439] Store merging assumes a merged region won't be too large. The assumption is e.g. in using inappropriate types in various spots (e.g. int for bit sizes and bit positions in a few spots, or unsigned for the total size in bytes of the merged region), in doing XNEWVEC for the whole total size of the merged region and preparing everything in there and even that XALLOCAVEC in two spots. The last case is what was breaking the test below in the patch, 64MB XALLOCAVEC is just too large, but even with that fixed I think we just shouldn't be merging gigabyte large merge groups. We already have --param=store-merging-max-size= parameter, right now with 65536 bytes maximum (if needed, we could raise that limit a little bit). That parameter is currently used when merging two adjacent stores, if the size of the already merged bitregion together with the new store's bitregion is above that limit, we don't merge those. I guess initially that was sufficient, at that time a store was always limited to MAX_BITSIZE_MODE_ANY_INT bits. But later on we've added support for empty ctors ({} and even later {CLOBBER}) and also added another spot where we merge further stores into the merge group, if there is some overlap, we can merge various other stores in one coalesce_immediate_stores iteration. And, we weren't applying the --param=store-merging-max-size= parameter in either of those cases. So a single store can be gigabytes long, and if there is some overlap, we can extend the region again to gigabytes in size. The following patch attempts to apply that parameter even in those cases. So, if testing if it should merge the merged group with info (we've already punted if those together are above the parameter) and some other stores, the first two hunks just punt if that would make the merge group too large. And the third hunk doesn't even add stores which are over the limit. 2024-11-06 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/117439 * gimple-ssa-store-merging.cc (imm_store_chain_info::coalesce_immediate_stores): Punt if merging of any of the additional overlapping stores would result in growing the bitregion size over param_store_merging_max_size. (pass_store_merging::process_store): Terminate all aliasing chains for stores with bitregion larger than param_store_merging_max_size. * g++.dg/opt/pr117439.C: New test.	2024-11-06 10:22:13 +01:00
Jakub Jelinek	aab572240a	store-merging: Don't use sub_byte_op_p mode for empty_ctor_p unless necessary [PR117439] encode_tree_to_bitpos uses the more expensive sub_byte_op_p mode in which it has to allocate a buffer and do various extra work like shifting the bits etc. if bitlen or bitpos aren't multiples of BITS_PER_UNIT, or if bitlen doesn't have corresponding integer mode. The last case is explained later in the comments: /* The native_encode_expr machinery uses TYPE_MODE to determine how many bytes to write. This means it can write more than ROUND_UP (bitlen, BITS_PER_UNIT) / BITS_PER_UNIT bytes (for example write 8 bytes for a bitlen of 40). Skip the bytes that are not within bitlen and zero out the bits that are not relevant as well (that may contain a sign bit due to sign-extension). / Now, we've later added empty_ctor_p support, either {} CONSTRUCTOR or {CLOBBER}, which doesn't use native_encode_expr at all, just memset, so that case doesn't need those fancy games unless bitlen or bitpos aren't multiples of BITS_PER_UNIT (unlikely, but let's pretend it is possible). The following patch makes us use the fast path even for empty_ctor_p which occupy full bytes, we can just memset that in the provided buffer and don't need to XALLOCAVEC another buffer. This patch in itself fixes the testcase from the PR (which was about using huge XALLLOCAVEC), but I want to do some other changes, to be posted in a next patch. 2024-11-06 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/117439 gimple-ssa-store-merging.cc (encode_tree_to_bitpos): For empty_ctor_p use !sub_byte_op_p even if bitlen doesn't have an integral mode.	2024-11-06 10:21:09 +01:00

1 2 3 4 5 ...

215092 Commits