wide-int: Fix wide_int division/remainder [PR112733]

This patch fixes the other half of the PR, where we were crashing on the UNSIGNED widest_int modulo of 1 % -128 (where the -128 in UNSIGNED is 131072 bit 0xfffff...fffff80). The patch actually has 2 independent parts. The fix for the divmod buffer overflow is in the 2nd-5th and 7th-8th hunks of the patch. abs (remainder) <= min (abs (dividend), abs (divisor)) and the wide-int.h callers estimate the remainder size to be the same as the quotient size, i.e. for UNSIGNED dividend with msb set quotient (same as remainder) precision based estimation, for SIGNED or dividend with msb clear estimate based on divident len (+ 1). For quotient (if any), wi_pack is called as quotient_len = wi_pack (quotient, b_quotient, m, dividend_prec); where m is m = dividend_blocks_needed; while (m > 1 && b_dividend[m - 1] == 0) m--; and dividend_blocks_needed = 2 * BLOCKS_NEEDED (dividend_prec); where wi_pack stores to result (in_len + 1) / 2 limbs (or one more if (in_len & 1) == 0 and in_len / 2 < BLOCKS_NEEDED (dividend_prec)). In any case, that is never more than BLOCKS_NEEDED (dividend_prec) and matches caller's estimations (then it is canonicalized and perhaps shrunk that way). The problematic case is the remainder, where wi_pack is called as *remainder_len = wi_pack (remainder, b_remainder, n, dividend_prec); The last argument is again based on the dividend precision like for quotient, but n is instead: n = divisor_blocks_needed; while (n > 1 && b_divisor[n - 1] == 0) n--; so based on the divisor rather than dividend. Now, in the wi::multiple_of_p (1, -128, UNSIGNED) widest_int precision case, dividend_prec is very small, the dividend is 1 and while the official precision is 131072 bits, dividend_len is 1 and if (sgn == SIGNED || dividend_val[dividend_len - 1] >= 0) dividend_prec = MIN ((dividend_len + 1) * HOST_BITS_PER_WIDE_INT, dividend_prec); shrinks the estimate to 128 bits in this case; but divisor_prec is huge, because the divisor is negative UNSIGNED, so 131072 bits, 2048 limbs, 4096 half-limbs (so divisor_blocks_needed and n are 4096). Now, wi_pack relies on the penultimate argument to be smaller or equal to 2 * BLOCKS_NEEDED of the last argument and that is not the case here, 4096 is certainly larger than 2 * BLOCKS_NEEDED (128) aka 4 and so wi_pack overflows the array. Unfortunately looking around, this isn't the only overflow, already in divmod_internal_2 we have an overflow, though there not overflowing a buffer during writing, but during reading. When divmod_internal_2 is called with the last 2 arguments like m=1, n=4096 as in this case (but generally any where m < n), it has a special case for n == 1 (which won't be the problem, we never have m or n 0), then decides based on msb of divisor whether there should be some shift canonicalization, then performs a for (j = m - n; j >= 0; j--) main loop and finally initializes b_remainder: if (s) for (i = 0; i < n; i++) b_remainder[i] = (b_dividend[i] >> s) | (b_dividend[i+1] << (HOST_BITS_PER_HALF_WIDE_INT - s)); else for (i = 0; i < n; i++) b_remainder[i] = b_dividend[i]; In the usual case of m >= n, the main loop will iterate at least once, b_dividend has m + 1 valid elements and the copying to b_remainder is correct. But for the m < n unusual case, the main loop never iterates and we want to copy the b_dividend right into b_remainder (possibly with shifts). Except when it copies n elements, it copies garbage which isn't there at all for the last n - m elements. Because the decision whether the main loop should iterate at all or not and the s computation should be based on the original n, the following patch sets n to MIN (n, m) only after the main loop, such that we copy to b_remainder at most what is initialized in b_dividend, and returns that adjusted value to the caller which then passes it to wi_pack and so satisfies again the requirement there, rather than trying to do the n = MIN (n, m) before calling divmod_internal_2 or after it. I believe the bug is mostly something I've only introduced myself in the > 576 bit wide_int/widest_int support changes, before that there was no attempt to decrease the precisions and so I think dividend_prec and divisor_prec were pretty much always the same (or always?), and even the copying of garbage wasn't the case because the only time m or n was decreased from the same precision was if there were 0s in the upper half-limbs. The 1st and 6th hunks in the patch is just that while looking at the code, I've noticed I computed wrong the sizes in the XALLOCAVEC case, somehow the 4 * in the static arrays led me believe I need to make the alloced buffers twice (in the multiplication) or 4 times (in the division/modulo cases) larger than needed. 2023-11-30 Jakub Jelinek <jakub@redhat.com> PR middle-end/112733 * wide-int.cc (wi::mul_internal): Don't allocate twice as much space for u, v and r as needed. (divmod_internal_2): Change return type from void to int, for n == 1 return 1, otherwise before writing b_dividend into b_remainder set n to MIN (n, m) and at the end return it. (wi::divmod_internal): Don't allocate 4 times as much space for b_quotient, b_remainder, b_dividend and b_divisor. Set n to result of divmod_internal_2. (wide_int_cc_tests): Add test for unsigned widest_int wi::multiple_of_p of 1 and -128.
2024-11-21 13:40:47 +00:00 · 2023-11-30 09:13:00 +01:00 · 2023-11-30 09:13:00 +01:00 · 248bf19714
commit 248bf19714
parent 6c9973e46b
1 changed files with 22 additions and 13 deletions
--- a/gcc/wide-int.cc
+++ b/gcc/wide-int.cc
@ -1477,10 +1477,10 @@ wi::mul_internal (HOST_WIDE_INT *val, const HOST_WIDE_INT *op1val,
  if (UNLIKELY (prec > WIDE_INT_MAX_INL_PRECISION))
    {
      unsigned HOST_HALF_WIDE_INT *buf
-	= XALLOCAVEC (unsigned HOST_HALF_WIDE_INT, 4 * 4 * blocks_needed);
+	= XALLOCAVEC (unsigned HOST_HALF_WIDE_INT, 4 * half_blocks_needed);
      u = buf;
-      v = u + 4 * blocks_needed;
-      r = v + 4 * blocks_needed;
+      v = u + half_blocks_needed;
+      r = v + half_blocks_needed;
    }

  /* We do unsigned mul and then correct it.  */
@ -1675,8 +1675,9 @@ wi::sub_large (HOST_WIDE_INT *val, const HOST_WIDE_INT *op0,
   Delight by Warren, which itself is a small modification of Knuth's
   algorithm.  M is the number of significant elements of U however
   there needs to be at least one extra element of B_DIVIDEND
-   allocated, N is the number of elements of B_DIVISOR.  */
-static void
+   allocated, N is the number of elements of B_DIVISOR.
+   Return new value for N.  */
+static int
 divmod_internal_2 (unsigned HOST_HALF_WIDE_INT *b_quotient,
 		   unsigned HOST_HALF_WIDE_INT *b_remainder,
 		   unsigned HOST_HALF_WIDE_INT *b_dividend,
@ -1707,7 +1708,7 @@ divmod_internal_2 (unsigned HOST_HALF_WIDE_INT *b_quotient,
 		  * (unsigned HOST_WIDE_INT)b_divisor[0]));
 	}
      b_remainder[0] = k;
-      return;
+      return 1;
    }

  s = clz_hwi (b_divisor[n-1]) - HOST_BITS_PER_HALF_WIDE_INT; /* CHECK clz */
@ -1770,6 +1771,10 @@ divmod_internal_2 (unsigned HOST_HALF_WIDE_INT *b_quotient,
 	  b_dividend[j+n] += k;
 	}
    }
+  /* If N > M, the main loop was skipped, quotient will be 0 and
+     we can't copy more than M half-limbs into the remainder, as they
+     aren't present in b_dividend (which has .  */
+  n = MIN (n, m);
  if (s)
    for (i = 0; i < n; i++)
      b_remainder[i] = (b_dividend[i] >> s)
@ -1777,6 +1782,7 @@ divmod_internal_2 (unsigned HOST_HALF_WIDE_INT *b_quotient,
  else
    for (i = 0; i < n; i++)
      b_remainder[i] = b_dividend[i];
+  return n;
 }


@ -1943,14 +1949,14 @@ wi::divmod_internal (HOST_WIDE_INT *quotient, unsigned int *remainder_len,
    {
      unsigned HOST_HALF_WIDE_INT *buf
 	= XALLOCAVEC (unsigned HOST_HALF_WIDE_INT,
-		      12 * dividend_blocks_needed
-		      + 4 * divisor_blocks_needed + 1);
+		      3 * dividend_blocks_needed + 1
+		      + divisor_blocks_needed);
      b_quotient = buf;
-      b_remainder = b_quotient + 4 * dividend_blocks_needed;
-      b_dividend = b_remainder + 4 * dividend_blocks_needed;
-      b_divisor = b_dividend + 4 * dividend_blocks_needed + 1;
+      b_remainder = b_quotient + dividend_blocks_needed;
+      b_dividend = b_remainder + dividend_blocks_needed;
+      b_divisor = b_dividend + dividend_blocks_needed + 1;
      memset (b_quotient, 0,
-	      4 * dividend_blocks_needed * sizeof (HOST_HALF_WIDE_INT));
+	      dividend_blocks_needed * sizeof (HOST_HALF_WIDE_INT));
    }
  wi_unpack (b_dividend, dividend.get_val (), dividend.get_len (),
 	     dividend_blocks_needed, dividend_prec, UNSIGNED);
@ -1969,7 +1975,7 @@ wi::divmod_internal (HOST_WIDE_INT *quotient, unsigned int *remainder_len,
  if (b_quotient == b_quotient_buf)
    memset (b_quotient_buf, 0, sizeof (b_quotient_buf));

-  divmod_internal_2 (b_quotient, b_remainder, b_dividend, b_divisor, m, n);
+  n = divmod_internal_2 (b_quotient, b_remainder, b_dividend, b_divisor, m, n);

  unsigned int quotient_len = 0;
  if (quotient)
@ -2673,6 +2679,9 @@ wide_int_cc_tests ()
 	     wi::shifted_mask (0, 128, false, 128));
  ASSERT_EQ (wi::mask (128, true, 128),
 	     wi::shifted_mask (0, 128, true, 128));
+  ASSERT_EQ (wi::multiple_of_p (from_int <widest_int> (1),
+				from_int <widest_int> (-128), UNSIGNED),
+	     false);
 }

 } // namespace selftest