mirror of
https://github.com/gcc-mirror/gcc.git
synced 2024-11-21 13:40:47 +00:00
d0e8f58b81
It is autumn again and there is a new Unicode version 16.0. The following patch updates our Unicode stuff in contrib, libcpp and libstdc++ from that Unicode version. 2024-10-08 Jakub Jelinek <jakub@redhat.com> contrib/ * unicode/README: Update glibc git commit hash, replace Unicode 15 or 15.1 versions with 16. * unicode/gen_libstdcxx_unicode_data.py: Use 160000 instead of 150100 in _GLIBCXX_GET_UNICODE_DATA test. * unicode/from_glibc/utf8_gen.py: Updated from glibc 064c708c78cc2a6b5802dce73108fc0c1c6bfc80 commit. * unicode/DerivedCoreProperties.txt: Updated from Unicode 16.0. * unicode/emoji-data.txt: Likewise. * unicode/PropList.txt: Likewise. * unicode/GraphemeBreakProperty.txt: Likewise. * unicode/DerivedNormalizationProps.txt: Likewise. * unicode/NameAliases.txt: Likewise. * unicode/UnicodeData.txt: Likewise. * unicode/EastAsianWidth.txt: Likewise. gcc/testsuite/ * c-c++-common/cpp/named-universal-char-escape-1.c: Add tests for some Unicode 16.0 characters, both normal and generated. libcpp/ * makeucnid.cc (write_copyright): Update Unicode Copyright years. * makeuname2c.cc (generated_ranges): Adjust Unicode version from 15.1 to 16.0. Add EGYPTIAN HIEROGLYPH- generated range, adjust indexes in following entries. (write_copyright): Update Unicode Copyright years. * generated_cpp_wcwidth.h: Regenerated. * ucnid.h: Regenerated. * uname2c.h: Regenerated. libstdc++-v3/ * include/bits/unicode.h (std::__unicode::__v15_1_0): Rename inline namespace to ... (std::__unicode::__v16_0_0): ... this. (_GLIBCXX_GET_UNICODE_DATA): Change from 150100 to 160000. * include/bits/unicode-data.h: Regenerated. * testsuite/ext/unicode/properties.cc: Check for _Gcb_SpacingMark on U+11F03 rather than U+1D16D as the latter lost SpacingMark property in Unicode 16.0.
82 lines
3.5 KiB
Plaintext
82 lines
3.5 KiB
Plaintext
This directory contains a mechanism for GCC to have its own internal
|
|
implementation of wcwidth functionality (cpp_wcwidth () in libcpp/charset.c),
|
|
as well as a mechanism to update the information about codepoints permitted in
|
|
identifiers, which is encoded in libcpp/ucnid.h, and mapping between Unicode
|
|
names and codepoints, which is encoded in libcpp/uname2c.h.
|
|
|
|
The idea is to produce the necessary lookup tables
|
|
(../../libcpp/{ucnid.h,uname2c.h,generated_cpp_wcwidth.h}) in a reproducible
|
|
way, starting from the following files that are distributed by the Unicode
|
|
Consortium:
|
|
|
|
ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt
|
|
ftp://ftp.unicode.org/Public/UNIDATA/EastAsianWidth.txt
|
|
ftp://ftp.unicode.org/Public/UNIDATA/PropList.txt
|
|
ftp://ftp.unicode.org/Public/UNIDATA/DerivedNormalizationProps.txt
|
|
ftp://ftp.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt
|
|
ftp://ftp.unicode.org/Public/UNIDATA/NameAliases.txt
|
|
|
|
Two additional files are needed for lookup tables in libstdc++:
|
|
|
|
ftp://ftp.unicode.org/Public/UNIDATA/auxiliary/GraphemeBreakProperty.txt
|
|
ftp://ftp.unicode.org/Public/UNIDATA/emoji/emoji-data.txt
|
|
|
|
All these files have been added to source control in this directory;
|
|
please see unicode-license.txt for the relevant copyright information.
|
|
|
|
In order to keep in sync with glibc's wcwidth as much as possible, it is
|
|
desirable for the logic that processes the Unicode data to be the same as
|
|
glibc's. To that end, we also put in this directory, in the from_glibc/
|
|
directory, the glibc python code that implements their logic. This code was
|
|
copied verbatim from glibc, and it can be updated at any time from the glibc
|
|
source code repository. The files copied from that repository are:
|
|
|
|
localedata/unicode-gen/unicode_utils.py
|
|
localedata/unicode-gen/utf8_gen.py
|
|
|
|
And the most recent versions added to GCC are from glibc git commit:
|
|
064c708c78cc2a6b5802dce73108fc0c1c6bfc80
|
|
|
|
The script gen_wcwidth.py found here contains the GCC-specific code to
|
|
map glibc's output to the lookup tables we require. This script should not need
|
|
to change, unless there are structural changes to the Unicode data files or to
|
|
the glibc code. Similarly, makeucnid.cc in ../../libcpp contains the logic to
|
|
produce ucnid.h.
|
|
|
|
The procedure to update GCC's Unicode support is the following:
|
|
|
|
1. Update the six Unicode data files from the above URLs.
|
|
|
|
2. Update the two glibc files in from_glibc/ from glibc's git. Update
|
|
the commit number above in this README.
|
|
|
|
3. Run ./gen_wcwidth.py X.Y > ../../libcpp/generated_cpp_wcwidth.h
|
|
(where X.Y is the version of the Unicode standard corresponding to the
|
|
Unicode data files being used, most recently, 16.0.0).
|
|
|
|
4. Update Unicode Copyright years in libcpp/makeucnid.cc and in
|
|
libcpp/makeuname2c.cc up to the year in which the Unicode
|
|
standard has been released.
|
|
|
|
5. Compile makeucnid, e.g. with:
|
|
g++ -O2 ../../libcpp/makeucnid.cc -o ../../libcpp/makeucnid
|
|
|
|
6. Generate ucnid.h as follows:
|
|
../../libcpp/makeucnid ../../libcpp/ucnid.tab UnicodeData.txt \
|
|
DerivedNormalizationProps.txt DerivedCoreProperties.txt \
|
|
> ../../libcpp/ucnid.h
|
|
|
|
7. Read the corresponding Unicode's standard and update correspondingly
|
|
generated_ranges table in libcpp/makeuname2c.cc (in Unicode 16 all
|
|
the needed information was in Table 4-8).
|
|
|
|
8. Compile makeuname2c, e.g. with:
|
|
g++ -O2 ../../libcpp/makeuname2c.cc -o ../../libcpp/makeuname2c
|
|
|
|
9: Generate uname2c.h as follows:
|
|
../../libcpp/makeuname2c UnicodeData.txt NameAliases.txt \
|
|
> ../../libcpp/uname2c.h
|
|
|
|
See gen_libstdcxx_unicode_data.py for instructions on updating the lookup
|
|
tables in libstdc++.
|