varasm: Shorten assembly of strings with larger zero regions

When not using .base64 directive, we emit for long sequences of zeros
        .string "foobarbaz"
        .string ""
        .string ""
        .string ""
        .string ""
        .string ""
        .string ""
        .string ""
        .string ""
        .string ""
        .string ""
        .string ""
        .string ""
The following patch changes that to
        .string "foobarbaz"
        .zero   12
It keeps emitting .string "" if there is just one zero or two zeros where
the first one is preceded by non-zeros, so we can have
        .string "foobarbaz"
        .string ""
or
        .base64 "VG8gYmUgb3Igbm90IHRvIGJlLCB0aGF0IGlzIHRoZSBxdWVzdGlvbg=="
        .string ""
but not 2 .string "" in a row.

On a testcase I have with around 310440 0-255 unsigned char character
constants mostly derived from cc1plus start but with too long sequences of
0s which broke transformation to STRING_CST adjusted to have at most 126
consecutive 0s, I see:
1504498 bytes long assembly without this patch on i686-linux (without
.base64 support in binutils)
1155071 bytes long assembly with this patch on i686-linux (without .base64
support in binutils)
431390 bytes long assembly without this patch on x86_64-linux (with
.base64 support in binutils)
427593 bytes long assembly with this patch on x86_64-linux (with .base64
support in binutils)
All 4 assemble to identical *.o file when using x86_64-linux .base64
supporting gas, and the former 2 when using older x86_64-linux gas assemble
to identical content as well.

2024-07-17  Jakub Jelinek  <jakub@redhat.com>

	* varasm.cc (default_elf_asm_output_ascii): Use ASM_OUTPUT_SKIP instead
	of 2 or more default_elf_asm_output_limited_string (f, "") calls and
	adjust base64 heuristics correspondingly.
This commit is contained in:
Jakub Jelinek 2024-07-17 17:30:24 +02:00 committed by Jakub Jelinek
parent 0135a90de5
commit d8a75353dd

View File

@ -8538,6 +8538,7 @@ default_elf_asm_output_ascii (FILE *f, const char *s, unsigned int len)
if (s >= last_base64)
{
unsigned cnt = 0;
unsigned char prev_c = ' ';
const char *t;
for (t = s; t < limit && (t - s) < (long) ELF_STRING_LIMIT - 1; t++)
{
@ -8560,7 +8561,13 @@ default_elf_asm_output_ascii (FILE *f, const char *s, unsigned int len)
break;
case 1:
if (c == 0)
cnt += 2 + strlen (STRING_ASM_OP) + 1;
{
if (prev_c == 0
&& t + 1 < limit
&& (t + 1 - s) < (long) ELF_STRING_LIMIT - 1)
break;
cnt += 2 + strlen (STRING_ASM_OP) + 1;
}
else
cnt += 4;
break;
@ -8568,6 +8575,7 @@ default_elf_asm_output_ascii (FILE *f, const char *s, unsigned int len)
cnt += 2;
break;
}
prev_c = c;
}
if (cnt > ((unsigned) (t - s) + 2) / 3 * 4 && (t - s) >= 3)
{
@ -8633,8 +8641,18 @@ default_elf_asm_output_ascii (FILE *f, const char *s, unsigned int len)
bytes_in_chunk = 0;
}
default_elf_asm_output_limited_string (f, s);
s = p;
if (p == s && p + 1 < limit && p[1] == '\0')
{
for (p = s + 2; p < limit && *p == '\0'; p++)
continue;
ASM_OUTPUT_SKIP (f, (unsigned HOST_WIDE_INT) (p - s));
s = p - 1;
}
else
{
default_elf_asm_output_limited_string (f, s);
s = p;
}
}
else
{