Add LOOP_VINFO_MAIN_LOOP_INFO

The following introduces LOOP_VINFO_MAIN_LOOP_INFO alongside
LOOP_VINFO_ORIG_LOOP_INFO so one can have both access to the main
vectorized loop info and the preceeding vectorized epilogue.
This is critical for correctness as we need to disallow never
executed epilogues by costing in vect_analyze_loop_costing as
we assume those do not exist when deciding to add a skip-vector
edge during peeling.  The patch also changes how multiple vector
epilogues are handled - instead of the epilogue_vinfos array in
the main loop info we now record the single epilogue_vinfo there
and further epilogues in the epilogue_vinfo member of the
epilogue info.  This simplifies code.

	* tree-vectorizer.h (_loop_vec_info::main_loop_info): New.
	(LOOP_VINFO_MAIN_LOOP_INFO): Likewise.
	(_loop_vec_info::epilogue_vinfo): Change from epilogue_vinfos
	from array to single element.
	* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
	main_loop_info and epilogue_vinfo.  Remove epilogue_vinfos
	allocation.
	(_loop_vec_info::~_loop_vec_info): Do not release epilogue_vinfos.
	(vect_create_loop_vinfo): Rename parameter, set
	LOOP_VINFO_MAIN_LOOP_INFO.
	(vect_analyze_loop_1): Rename parameter.
	(vect_analyze_loop_costing): Properly distinguish between
	the main vector loop and the preceeding epilogue.
	(vect_analyze_loop): Change for epilogue_vinfos no longer
	being a vector.
	* tree-vect-loop-manip.cc (vect_do_peeling): Simplify and
	thereby handle a vector epilogue of a vector epilogue.
This commit is contained in:
Richard Biener 2024-11-04 12:58:41 +01:00 committed by Richard Biener
parent 2c25af0e52
commit 42d99f63cf
3 changed files with 53 additions and 48 deletions

View File

@ -3100,12 +3100,12 @@ vect_get_main_loop_result (loop_vec_info loop_vinfo, tree main_loop_value,
The analysis resulting in this epilogue loop's loop_vec_info was performed The analysis resulting in this epilogue loop's loop_vec_info was performed
in the same vect_analyze_loop call as the main loop's. At that time in the same vect_analyze_loop call as the main loop's. At that time
vect_analyze_loop constructs a list of accepted loop_vec_info's for lower vect_analyze_loop constructs a list of accepted loop_vec_info's for lower
vectorization factors than the main loop. This list is stored in the main vectorization factors than the main loop. This list is chained in the
loop's loop_vec_info in the 'epilogue_vinfos' member. Everytime we decide to loop's loop_vec_info in the 'epilogue_vinfo' member. When we decide to
vectorize the epilogue loop for a lower vectorization factor, the vectorize the epilogue loop for a lower vectorization factor, the
loop_vec_info sitting at the top of the epilogue_vinfos list is removed, loop_vec_info in epilogue_vinfo is updated and linked to the epilogue loop.
updated and linked to the epilogue loop. This is later used to vectorize This is later used to vectorize the epilogue.
the epilogue. The reason the loop_vec_info needs updating is that it was The reason the loop_vec_info needs updating is that it was
constructed based on the original main loop, and the epilogue loop is a constructed based on the original main loop, and the epilogue loop is a
copy of this loop, so all links pointing to statements in the original loop copy of this loop, so all links pointing to statements in the original loop
need updating. Furthermore, these loop_vec_infos share the need updating. Furthermore, these loop_vec_infos share the
@ -3128,7 +3128,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
profile_probability prob_prolog, prob_vector, prob_epilog; profile_probability prob_prolog, prob_vector, prob_epilog;
int estimated_vf; int estimated_vf;
int prolog_peeling = 0; int prolog_peeling = 0;
bool vect_epilogues = loop_vinfo->epilogue_vinfos.length () > 0; bool vect_epilogues = loop_vinfo->epilogue_vinfo != NULL;
/* We currently do not support prolog peeling if the target alignment is not /* We currently do not support prolog peeling if the target alignment is not
known at compile time. 'vect_gen_prolog_loop_niters' depends on the known at compile time. 'vect_gen_prolog_loop_niters' depends on the
target alignment being constant. */ target alignment being constant. */
@ -3255,13 +3255,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
else else
niters_prolog = build_int_cst (type, 0); niters_prolog = build_int_cst (type, 0);
loop_vec_info epilogue_vinfo = NULL; loop_vec_info epilogue_vinfo = loop_vinfo->epilogue_vinfo;
if (vect_epilogues)
{
epilogue_vinfo = loop_vinfo->epilogue_vinfos[0];
loop_vinfo->epilogue_vinfos.ordered_remove (0);
}
tree niters_vector_mult_vf = NULL_TREE; tree niters_vector_mult_vf = NULL_TREE;
/* Saving NITERs before the loop, as this may be changed by prologue. */ /* Saving NITERs before the loop, as this may be changed by prologue. */
tree before_loop_niters = LOOP_VINFO_NITERS (loop_vinfo); tree before_loop_niters = LOOP_VINFO_NITERS (loop_vinfo);

View File

@ -1071,7 +1071,9 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
has_mask_store (false), has_mask_store (false),
scalar_loop_scaling (profile_probability::uninitialized ()), scalar_loop_scaling (profile_probability::uninitialized ()),
scalar_loop (NULL), scalar_loop (NULL),
main_loop_info (NULL),
orig_loop_info (NULL), orig_loop_info (NULL),
epilogue_vinfo (NULL),
drs_advanced_by (NULL_TREE), drs_advanced_by (NULL_TREE),
vec_loop_iv_exit (NULL), vec_loop_iv_exit (NULL),
vec_epilogue_loop_iv_exit (NULL), vec_epilogue_loop_iv_exit (NULL),
@ -1128,8 +1130,6 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
} }
} }
} }
epilogue_vinfos.create (6);
} }
/* Free all levels of rgroup CONTROLS. */ /* Free all levels of rgroup CONTROLS. */
@ -1155,7 +1155,6 @@ _loop_vec_info::~_loop_vec_info ()
release_vec_loop_controls (&lens); release_vec_loop_controls (&lens);
delete ivexpr_map; delete ivexpr_map;
delete scan_map; delete scan_map;
epilogue_vinfos.release ();
delete scalar_costs; delete scalar_costs;
delete vector_costs; delete vector_costs;
@ -1936,15 +1935,20 @@ vect_analyze_loop_form (class loop *loop, gimple *loop_vectorized_call,
loop_vec_info loop_vec_info
vect_create_loop_vinfo (class loop *loop, vec_info_shared *shared, vect_create_loop_vinfo (class loop *loop, vec_info_shared *shared,
const vect_loop_form_info *info, const vect_loop_form_info *info,
loop_vec_info main_loop_info) loop_vec_info orig_loop_info)
{ {
loop_vec_info loop_vinfo = new _loop_vec_info (loop, shared); loop_vec_info loop_vinfo = new _loop_vec_info (loop, shared);
LOOP_VINFO_NITERSM1 (loop_vinfo) = info->number_of_iterationsm1; LOOP_VINFO_NITERSM1 (loop_vinfo) = info->number_of_iterationsm1;
LOOP_VINFO_NITERS (loop_vinfo) = info->number_of_iterations; LOOP_VINFO_NITERS (loop_vinfo) = info->number_of_iterations;
LOOP_VINFO_NITERS_UNCHANGED (loop_vinfo) = info->number_of_iterations; LOOP_VINFO_NITERS_UNCHANGED (loop_vinfo) = info->number_of_iterations;
LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo) = main_loop_info; LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo) = orig_loop_info;
if (orig_loop_info && LOOP_VINFO_EPILOGUE_P (orig_loop_info))
LOOP_VINFO_MAIN_LOOP_INFO (loop_vinfo)
= LOOP_VINFO_MAIN_LOOP_INFO (orig_loop_info);
else
LOOP_VINFO_MAIN_LOOP_INFO (loop_vinfo) = orig_loop_info;
/* Also record the assumptions for versioning. */ /* Also record the assumptions for versioning. */
if (!integer_onep (info->assumptions) && !main_loop_info) if (!integer_onep (info->assumptions) && !orig_loop_info)
LOOP_VINFO_NITERS_ASSUMPTIONS (loop_vinfo) = info->assumptions; LOOP_VINFO_NITERS_ASSUMPTIONS (loop_vinfo) = info->assumptions;
for (gcond *cond : info->conds) for (gcond *cond : info->conds)
@ -2314,17 +2318,19 @@ vect_analyze_loop_costing (loop_vec_info loop_vinfo,
{ {
loop_vec_info orig_loop_vinfo loop_vec_info orig_loop_vinfo
= LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo); = LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo);
loop_vec_info main_loop_vinfo
= LOOP_VINFO_MAIN_LOOP_INFO (loop_vinfo);
unsigned lowest_vf unsigned lowest_vf
= constant_lower_bound (LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo)); = constant_lower_bound (LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo));
int prolog_peeling = 0; int prolog_peeling = 0;
if (!vect_use_loop_mask_for_alignment_p (loop_vinfo)) if (!vect_use_loop_mask_for_alignment_p (main_loop_vinfo))
prolog_peeling = LOOP_VINFO_PEELING_FOR_ALIGNMENT (orig_loop_vinfo); prolog_peeling = LOOP_VINFO_PEELING_FOR_ALIGNMENT (main_loop_vinfo);
if (prolog_peeling >= 0 if (prolog_peeling >= 0
&& known_eq (LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo), && known_eq (LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo),
lowest_vf)) lowest_vf))
{ {
unsigned gap unsigned gap
= LOOP_VINFO_PEELING_FOR_GAPS (orig_loop_vinfo) ? 1 : 0; = LOOP_VINFO_PEELING_FOR_GAPS (main_loop_vinfo) ? 1 : 0;
scalar_niters = ((scalar_niters - gap - prolog_peeling) scalar_niters = ((scalar_niters - gap - prolog_peeling)
% lowest_vf + gap); % lowest_vf + gap);
} }
@ -3440,7 +3446,7 @@ vect_joust_loop_vinfos (loop_vec_info new_loop_vinfo,
return true; return true;
} }
/* Analyze LOOP with VECTOR_MODES[MODE_I] and as epilogue if MAIN_LOOP_VINFO is /* Analyze LOOP with VECTOR_MODES[MODE_I] and as epilogue if ORIG_LOOP_VINFO is
not NULL. Set AUTODETECTED_VECTOR_MODE if VOIDmode and advance not NULL. Set AUTODETECTED_VECTOR_MODE if VOIDmode and advance
MODE_I to the next mode useful to analyze. MODE_I to the next mode useful to analyze.
Return the loop_vinfo on success and wrapped null on failure. */ Return the loop_vinfo on success and wrapped null on failure. */
@ -3448,13 +3454,13 @@ vect_joust_loop_vinfos (loop_vec_info new_loop_vinfo,
static opt_loop_vec_info static opt_loop_vec_info
vect_analyze_loop_1 (class loop *loop, vec_info_shared *shared, vect_analyze_loop_1 (class loop *loop, vec_info_shared *shared,
const vect_loop_form_info *loop_form_info, const vect_loop_form_info *loop_form_info,
loop_vec_info main_loop_vinfo, loop_vec_info orig_loop_vinfo,
const vector_modes &vector_modes, unsigned &mode_i, const vector_modes &vector_modes, unsigned &mode_i,
machine_mode &autodetected_vector_mode, machine_mode &autodetected_vector_mode,
bool &fatal) bool &fatal)
{ {
loop_vec_info loop_vinfo loop_vec_info loop_vinfo
= vect_create_loop_vinfo (loop, shared, loop_form_info, main_loop_vinfo); = vect_create_loop_vinfo (loop, shared, loop_form_info, orig_loop_vinfo);
machine_mode vector_mode = vector_modes[mode_i]; machine_mode vector_mode = vector_modes[mode_i];
loop_vinfo->vector_mode = vector_mode; loop_vinfo->vector_mode = vector_mode;
@ -3471,7 +3477,7 @@ vect_analyze_loop_1 (class loop *loop, vec_info_shared *shared,
res ? "succeeded" : "failed", res ? "succeeded" : "failed",
GET_MODE_NAME (loop_vinfo->vector_mode)); GET_MODE_NAME (loop_vinfo->vector_mode));
if (res && !main_loop_vinfo && suggested_unroll_factor > 1) if (res && !LOOP_VINFO_EPILOGUE_P (loop_vinfo) && suggested_unroll_factor > 1)
{ {
if (dump_enabled_p ()) if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location, dump_printf_loc (MSG_NOTE, vect_location,
@ -3480,7 +3486,7 @@ vect_analyze_loop_1 (class loop *loop, vec_info_shared *shared,
suggested_unroll_factor, suggested_unroll_factor,
slp_done_for_suggested_uf ? "on" : "off"); slp_done_for_suggested_uf ? "on" : "off");
loop_vec_info unroll_vinfo loop_vec_info unroll_vinfo
= vect_create_loop_vinfo (loop, shared, loop_form_info, main_loop_vinfo); = vect_create_loop_vinfo (loop, shared, loop_form_info, NULL);
unroll_vinfo->vector_mode = vector_mode; unroll_vinfo->vector_mode = vector_mode;
unroll_vinfo->suggested_unroll_factor = suggested_unroll_factor; unroll_vinfo->suggested_unroll_factor = suggested_unroll_factor;
opt_result new_res = vect_analyze_loop_2 (unroll_vinfo, fatal, NULL, opt_result new_res = vect_analyze_loop_2 (unroll_vinfo, fatal, NULL,
@ -3534,7 +3540,7 @@ vect_analyze_loop_1 (class loop *loop, vec_info_shared *shared,
{ {
delete loop_vinfo; delete loop_vinfo;
if (fatal) if (fatal)
gcc_checking_assert (main_loop_vinfo == NULL); gcc_checking_assert (orig_loop_vinfo == NULL);
return opt_loop_vec_info::propagate_failure (res); return opt_loop_vec_info::propagate_failure (res);
} }
@ -3746,21 +3752,17 @@ vect_analyze_loop (class loop *loop, gimple *loop_vectorized_call,
if (loop_vinfo) if (loop_vinfo)
{ {
if (pick_lowest_cost_p) if (pick_lowest_cost_p
&& first_loop_vinfo->epilogue_vinfo
&& vect_joust_loop_vinfos (loop_vinfo,
first_loop_vinfo->epilogue_vinfo))
{ {
/* Keep trying to roll back vectorization attempts while the gcc_assert (vect_epilogues);
loop_vec_infos they produced were worse than this one. */ delete first_loop_vinfo->epilogue_vinfo;
vec<loop_vec_info> &vinfos = first_loop_vinfo->epilogue_vinfos; first_loop_vinfo->epilogue_vinfo = nullptr;
while (!vinfos.is_empty ()
&& vect_joust_loop_vinfos (loop_vinfo, vinfos.last ()))
{
gcc_assert (vect_epilogues);
delete vinfos.pop ();
}
} }
/* For now only allow one epilogue loop. */ if (!first_loop_vinfo->epilogue_vinfo)
if (first_loop_vinfo->epilogue_vinfos.is_empty ()) first_loop_vinfo->epilogue_vinfo = loop_vinfo;
first_loop_vinfo->epilogue_vinfos.safe_push (loop_vinfo);
else else
{ {
delete loop_vinfo; delete loop_vinfo;
@ -3779,11 +3781,12 @@ vect_analyze_loop (class loop *loop, gimple *loop_vectorized_call,
} }
if (!first_loop_vinfo->epilogue_vinfos.is_empty ()) if (first_loop_vinfo->epilogue_vinfo)
{ {
poly_uint64 lowest_th poly_uint64 lowest_th
= LOOP_VINFO_VERSIONING_THRESHOLD (first_loop_vinfo); = LOOP_VINFO_VERSIONING_THRESHOLD (first_loop_vinfo);
for (auto epilog_vinfo : first_loop_vinfo->epilogue_vinfos) loop_vec_info epilog_vinfo = first_loop_vinfo->epilogue_vinfo;
do
{ {
poly_uint64 th = LOOP_VINFO_VERSIONING_THRESHOLD (epilog_vinfo); poly_uint64 th = LOOP_VINFO_VERSIONING_THRESHOLD (epilog_vinfo);
gcc_assert (!LOOP_REQUIRES_VERSIONING (epilog_vinfo) gcc_assert (!LOOP_REQUIRES_VERSIONING (epilog_vinfo)
@ -3791,13 +3794,15 @@ vect_analyze_loop (class loop *loop, gimple *loop_vectorized_call,
/* Keep track of the known smallest versioning threshold. */ /* Keep track of the known smallest versioning threshold. */
if (ordered_p (lowest_th, th)) if (ordered_p (lowest_th, th))
lowest_th = ordered_min (lowest_th, th); lowest_th = ordered_min (lowest_th, th);
epilog_vinfo = epilog_vinfo->epilogue_vinfo;
} }
while (epilog_vinfo);
LOOP_VINFO_VERSIONING_THRESHOLD (first_loop_vinfo) = lowest_th; LOOP_VINFO_VERSIONING_THRESHOLD (first_loop_vinfo) = lowest_th;
if (dump_enabled_p ()) if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location, dump_printf_loc (MSG_NOTE, vect_location,
"***** Choosing epilogue vector mode %s\n", "***** Choosing epilogue vector mode %s\n",
GET_MODE_NAME GET_MODE_NAME
(first_loop_vinfo->epilogue_vinfos[0]->vector_mode)); (first_loop_vinfo->epilogue_vinfo->vector_mode));
} }
return first_loop_vinfo; return first_loop_vinfo;

View File

@ -987,12 +987,17 @@ public:
class loop *scalar_loop; class loop *scalar_loop;
/* For loops being epilogues of already vectorized loops /* For loops being epilogues of already vectorized loops
this points to the original vectorized loop. Otherwise NULL. */ this points to the main vectorized loop. Otherwise NULL. */
_loop_vec_info *main_loop_info;
/* For loops being epilogues of already vectorized loops
this points to the preceeding vectorized (possibly epilogue) loop.
Otherwise NULL. */
_loop_vec_info *orig_loop_info; _loop_vec_info *orig_loop_info;
/* Used to store loop_vec_infos of epilogues of this loop during /* Used to store loop_vec_infos of the epilogue of this loop during
analysis. */ analysis. */
vec<_loop_vec_info *> epilogue_vinfos; _loop_vec_info *epilogue_vinfo;
/* If this is an epilogue loop the DR advancement applied. */ /* If this is an epilogue loop the DR advancement applied. */
tree drs_advanced_by; tree drs_advanced_by;
@ -1096,6 +1101,7 @@ public:
#define LOOP_VINFO_SCALAR_LOOP_SCALING(L) (L)->scalar_loop_scaling #define LOOP_VINFO_SCALAR_LOOP_SCALING(L) (L)->scalar_loop_scaling
#define LOOP_VINFO_HAS_MASK_STORE(L) (L)->has_mask_store #define LOOP_VINFO_HAS_MASK_STORE(L) (L)->has_mask_store
#define LOOP_VINFO_SCALAR_ITERATION_COST(L) (L)->scalar_cost_vec #define LOOP_VINFO_SCALAR_ITERATION_COST(L) (L)->scalar_cost_vec
#define LOOP_VINFO_MAIN_LOOP_INFO(L) (L)->main_loop_info
#define LOOP_VINFO_ORIG_LOOP_INFO(L) (L)->orig_loop_info #define LOOP_VINFO_ORIG_LOOP_INFO(L) (L)->orig_loop_info
#define LOOP_VINFO_SIMD_IF_COND(L) (L)->simd_if_cond #define LOOP_VINFO_SIMD_IF_COND(L) (L)->simd_if_cond
#define LOOP_VINFO_INNER_LOOP_COST_FACTOR(L) (L)->inner_loop_cost_factor #define LOOP_VINFO_INNER_LOOP_COST_FACTOR(L) (L)->inner_loop_cost_factor