Index: gcc/ChangeLog

2005-03-14  Geoffrey Keating  <geoffk@apple.com>

	* doc/cppopts.texi (-fexec-charset): Add concept index entry.
	(-fwide-exec-charset): Likewise.
	(-finput-charset): Likewise.
	* doc/invoke.texi (Warning Options): Document -Wnormalized=.
	* c-opts.c (c_common_handle_option): Handle -Wnormalized=.
	* c.opt (Wnormalized): New.

Index: libcpp/ChangeLog
2005-03-14  Geoffrey Keating  <geoffk@apple.com>

	* init.c (cpp_create_reader): Default warn_normalize to normalized_C.
	* charset.c: Update for new format of ucnid.h.
	(ucn_valid_in_identifier): Update for new format of ucnid.h.
	Add NST parameter, and update it; update callers.
	(cpp_valid_ucn): Add NST parameter, update callers.  Replace abort
	with cpp_error.
	(convert_ucn): Pass normalize_state to cpp_valid_ucn.
	* internal.h (struct normalize_state): New.
	(INITIAL_NORMALIZE_STATE): New.
	(NORMALIZE_STATE_RESULT): New.
	(NORMALIZE_STATE_UPDATE_IDNUM): New.
	(_cpp_valid_ucn): New.
	* lex.c (warn_about_normalization): New.
	(forms_identifier_p): Add normalize_state parameter, update callers.
	(lex_identifier): Add normalize_state parameter, update callers.  Keep
	the state current.
	(lex_number): Likewise.
	(_cpp_lex_direct): Pass normalize_state to subroutines.  Check
	it with warn_about_normalization.
	* makeucnid.c: New.
	* ucnid.h: Replace.
	* ucnid.pl: Remove.
	* ucnid.tab: Make appropriate for input to makeucnid.c.  Remove
	comments about obsolete version of C++.
	* include/cpplib.h (enum cpp_normalize_level): New.
	(struct cpp_options): Add warn_normalize field.

Index: gcc/testsuite/ChangeLog
2005-03-14  Geoffrey Keating  <geoffk@apple.com>

	* gcc.dg/cpp/normalize-1.c: New.
	* gcc.dg/cpp/normalize-2.c: New.
	* gcc.dg/cpp/normalize-3.c: New.
	* gcc.dg/cpp/normalize-4.c: New.
	* gcc.dg/cpp/ucnid-4.c: New.
	* gcc.dg/cpp/ucnid-5.c: New.
	* g++.dg/cpp/normalize-1.C: New.
	* g++.dg/cpp/ucnid-1.C: New.

From-SVN: r96459
This commit is contained in:
Geoffrey Keating 2005-03-15 00:36:33 +00:00 committed by Geoffrey Keating
parent cd8b38b9eb
commit 50668cf626
24 changed files with 1708 additions and 548 deletions

View File

@ -1,3 +1,12 @@
2005-03-14 Geoffrey Keating <geoffk@apple.com>
* doc/cppopts.texi (-fexec-charset): Add concept index entry.
(-fwide-exec-charset): Likewise.
(-finput-charset): Likewise.
* doc/invoke.texi (Warning Options): Document -Wnormalized=.
* c-opts.c (c_common_handle_option): Handle -Wnormalized=.
* c.opt (Wnormalized): New.
2005-03-14 Devang Patel <dpatel@apple.com>
* doc/invoke.texi: Add reference to Visibility document.

View File

@ -460,6 +460,19 @@ c_common_handle_option (size_t scode, const char *arg, int value)
cpp_opts->warn_multichar = value;
break;
case OPT_Wnormalized_:
if (!value || (arg && strcasecmp (arg, "none") == 0))
cpp_opts->warn_normalize = normalized_none;
else if (!arg || strcasecmp (arg, "nfkc") == 0)
cpp_opts->warn_normalize = normalized_KC;
else if (strcasecmp (arg, "id") == 0)
cpp_opts->warn_normalize = normalized_identifier_C;
else if (strcasecmp (arg, "nfc") == 0)
cpp_opts->warn_normalize = normalized_C;
else
error ("argument %qs to %<-Wnormalized%> not recognized", arg);
break;
case OPT_Wreturn_type:
warn_return_type = value;
break;

View File

@ -285,6 +285,10 @@ Wnonnull
C ObjC Var(warn_nonnull)
Warn about NULL being passed to argument slots marked as requiring non-NULL
Wnormalized=
C ObjC C++ ObjC++ Joined
-Wnormalized=<id|nfc|nfkc> Warn about non-normalised Unicode strings
Wold-style-cast
C++ ObjC++ Var(warn_old_style_cast)
Warn if a C-style cast is used in a program

View File

@ -530,12 +530,14 @@ ignored. The default is 8.
@item -fexec-charset=@var{charset}
@opindex fexec-charset
@cindex character set, execution
Set the execution character set, used for string and character
constants. The default is UTF-8. @var{charset} can be any encoding
supported by the system's @code{iconv} library routine.
@item -fwide-exec-charset=@var{charset}
@opindex fwide-exec-charset
@cindex character set, wide execution
Set the wide execution character set, used for wide string and
character constants. The default is UTF-32 or UTF-16, whichever
corresponds to the width of @code{wchar_t}. As with
@ -545,6 +547,7 @@ problems with encodings that do not fit exactly in @code{wchar_t}.
@item -finput-charset=@var{charset}
@opindex finput-charset
@cindex character set, input
Set the input character set, used for translation from the character
set of the input file to the source character set used by GCC@. If the
locale does not specify, or GCC cannot get this information from the

View File

@ -3039,6 +3039,51 @@ Do not warn if a multicharacter constant (@samp{'FOOF'}) is used.
Usually they indicate a typo in the user's code, as they have
implementation-defined values, and should not be used in portable code.
@item -Wnormalized=<none|id|nfc|nfkc>
@opindex Wnormalized
@cindex NFC
@cindex NFKC
@cindex character set, input normalization
In ISO C and ISO C++, two identifiers are different if they are
different sequences of characters. However, sometimes when characters
outside the basic ASCII character set are used, you can have two
different character sequences that look the same. To avoid confusion,
the ISO 10646 standard sets out some @dfn{normalization rules} which
when applied ensure that two sequences that look the same are turned into
the same sequence. GCC can warn you if you are using identifiers which
have not been normalized; this option controls that warning.
There are four levels of warning that GCC supports. The default is
@option{-Wnormalized=nfc}, which warns about any identifier which is
not in the ISO 10646 ``C'' normalized form, @dfn{NFC}. NFC is the
recommended form for most uses.
Unfortunately, there are some characters which ISO C and ISO C++ allow
in identifiers that when turned into NFC aren't allowable as
identifiers. That is, there's no way to use these symbols in portable
ISO C or C++ and have all your identifiers in NFC.
@option{-Wnormalized=id} suppresses the warning for these characters.
It is hoped that future versions of the standards involved will correct
this, which is why this option is not the default.
You can switch the warning off for all characters by writing
@option{-Wnormalized=none}. You would only want to do this if you
were using some other normalization scheme (like ``D''), because
otherwise you can easily create bugs that are literally impossible to see.
Some characters in ISO 10646 have distinct meanings but look identical
in some fonts or display methodologies, especially once formatting has
been applied. For instance @code{\u207F}, ``SUPERSCRIPT LATIN SMALL
LETTER N'', will display just like a regular @code{n} which has been
placed in a superscript. ISO 10646 defines the @dfn{NFKC}
normalisation scheme to convert all these into a standard form as
well, and GCC will warn if your code is not in NFKC if you use
@option{-Wnormalized=nfkc}. This warning is comparable to warning
about every identifier that contains the letter O because it might be
confused with the digit 0, and so is not the default, but may be
useful as a local coding convention if the programming environment is
unable to be fixed to display these characters distinctly.
@item -Wno-deprecated-declarations
@opindex Wno-deprecated-declarations
Do not warn about uses of functions, variables, and types marked as

View File

@ -1,3 +1,14 @@
2005-03-14 Geoffrey Keating <geoffk@apple.com>
* gcc.dg/cpp/normalize-1.c: New.
* gcc.dg/cpp/normalize-2.c: New.
* gcc.dg/cpp/normalize-3.c: New.
* gcc.dg/cpp/normalize-4.c: New.
* gcc.dg/cpp/ucnid-4.c: New.
* gcc.dg/cpp/ucnid-5.c: New.
* g++.dg/cpp/normalize-1.C: New.
* g++.dg/cpp/ucnid-1.C: New.
2005-03-14 Alexandre Oliva <aoliva@redhat.com>
* gcc.dg/pr18628.c: New.

View File

@ -0,0 +1,34 @@
/* { dg-do preprocess } */
/* { dg-options "-Wnormalized=id" } */
\u00AA
\u00B7
\u0F43 /* { dg-warning "not in NFC" } */
a\u05B8\u05B9\u05B9\u05BBb
a\u05BB\u05B9\u05B8\u05B9b /* { dg-warning "not in NFC" } */
\u09CB
\u09C7\u09BE /* { dg-warning "not in NFC" } */
\u0B4B
\u0B47\u0B3E /* { dg-warning "not in NFC" } */
\u0BCA
\u0BC6\u0BBE /* { dg-warning "not in NFC" } */
\u0BCB
\u0BC7\u0BBE /* { dg-warning "not in NFC" } */
\u0CCA
\u0CC6\u0CC2 /* { dg-warning "not in NFC" } */
\u0D4A
\u0D46\u0D3E /* { dg-warning "not in NFC" } */
\u0D4B
\u0D47\u0D3E /* { dg-warning "not in NFC" } */
K
\u212A
\u03AC
\u1F71 /* { dg-warning "not in NFC" } */
\uAC00
\u1100\u1161
\uAC01
\u1100\u1161\u11A8
\uAC00\u11A8

View File

@ -0,0 +1,17 @@
/* { dg-do preprocess } */
/* { dg-options "-pedantic" } */
\u00AA /* { dg-error "not valid in an identifier" } */
\u00AB /* { dg-error "not valid in an identifier" } */
\u00B6 /* { dg-error "not valid in an identifier" } */
\u00BA /* { dg-error "not valid in an identifier" } */
\u00C0
\u00D6
\u0384
\u0669 /* { dg-error "not valid in an identifier" } */
A\u0669 /* { dg-error "not valid in an identifier" } */
0\u00BA /* { dg-error "not valid in an identifier" } */
0\u0669 /* { dg-error "not valid in an identifier" } */
\u0E59
A\u0E59

View File

@ -0,0 +1,34 @@
/* { dg-do preprocess } */
/* { dg-options "-std=c99" } */
\u00AA
\u00B7
\u0F43 /* { dg-warning "not in NFC" } */
a\u05B8\u05B9\u05B9\u05BBb
a\u05BB\u05B9\u05B8\u05B9b /* { dg-warning "not in NFC" } */
\u09CB
\u09C7\u09BE /* { dg-warning "not in NFC" } */
\u0B4B
\u0B47\u0B3E /* { dg-warning "not in NFC" } */
\u0BCA
\u0BC6\u0BBE /* { dg-warning "not in NFC" } */
\u0BCB
\u0BC7\u0BBE /* { dg-warning "not in NFC" } */
\u0CCA
\u0CC6\u0CC2 /* { dg-warning "not in NFC" } */
\u0D4A
\u0D46\u0D3E /* { dg-warning "not in NFC" } */
\u0D4B
\u0D47\u0D3E /* { dg-warning "not in NFC" } */
K
\u212A /* { dg-warning "not in NFC" } */
\u03AC
\u1F71 /* { dg-warning "not in NFC" } */
\uAC00
\u1100\u1161 /* { dg-warning "not in NFC" } */
\uAC01
\u1100\u1161\u11A8 /* { dg-warning "not in NFC" } */
\uAC00\u11A8 /* { dg-warning "not in NFC" } */

View File

@ -0,0 +1,34 @@
/* { dg-do preprocess } */
/* { dg-options "-std=c99 -Wnormalized=nfkc" } */
\u00AA /* { dg-warning "not in NFKC" } */
\u00B7
\u0F43 /* { dg-warning "not in NFC" } */
a\u05B8\u05B9\u05B9\u05BBb
a\u05BB\u05B9\u05B8\u05B9b /* { dg-warning "not in NFC" } */
\u09CB
\u09C7\u09BE /* { dg-warning "not in NFC" } */
\u0B4B
\u0B47\u0B3E /* { dg-warning "not in NFC" } */
\u0BCA
\u0BC6\u0BBE /* { dg-warning "not in NFC" } */
\u0BCB
\u0BC7\u0BBE /* { dg-warning "not in NFC" } */
\u0CCA
\u0CC6\u0CC2 /* { dg-warning "not in NFC" } */
\u0D4A
\u0D46\u0D3E /* { dg-warning "not in NFC" } */
\u0D4B
\u0D47\u0D3E /* { dg-warning "not in NFC" } */
K
\u212A /* { dg-warning "not in NFC" } */
\u03AC
\u1F71 /* { dg-warning "not in NFC" } */
\uAC00
\u1100\u1161 /* { dg-warning "not in NFC" } */
\uAC01
\u1100\u1161\u11A8 /* { dg-warning "not in NFC" } */
\uAC00\u11A8 /* { dg-warning "not in NFC" } */

View File

@ -0,0 +1,34 @@
/* { dg-do preprocess } */
/* { dg-options "-std=c99 -Wnormalized=id" } */
\u00AA
\u00B7
\u0F43 /* { dg-warning "not in NFC" } */
a\u05B8\u05B9\u05B9\u05BBb
a\u05BB\u05B9\u05B8\u05B9b /* { dg-warning "not in NFC" } */
\u09CB
\u09C7\u09BE /* { dg-warning "not in NFC" } */
\u0B4B
\u0B47\u0B3E /* { dg-warning "not in NFC" } */
\u0BCA
\u0BC6\u0BBE /* { dg-warning "not in NFC" } */
\u0BCB
\u0BC7\u0BBE /* { dg-warning "not in NFC" } */
\u0CCA
\u0CC6\u0CC2 /* { dg-warning "not in NFC" } */
\u0D4A
\u0D46\u0D3E /* { dg-warning "not in NFC" } */
\u0D4B
\u0D47\u0D3E /* { dg-warning "not in NFC" } */
K
\u212A
\u03AC
\u1F71 /* { dg-warning "not in NFC" } */
\uAC00
\u1100\u1161
\uAC01
\u1100\u1161\u11A8
\uAC00\u11A8

View File

@ -0,0 +1,34 @@
/* { dg-do preprocess } */
/* { dg-options "-std=c99 -Wnormalized=none" } */
\u00AA
\u00B7
\u0F43
a\u05B8\u05B9\u05B9\u05BBb
a\u05BB\u05B9\u05B8\u05B9b
\u09CB
\u09C7\u09BE
\u0B4B
\u0B47\u0B3E
\u0BCA
\u0BC6\u0BBE
\u0BCB
\u0BC7\u0BBE
\u0CCA
\u0CC6\u0CC2
\u0D4A
\u0D46\u0D3E
\u0D4B
\u0D47\u0D3E
K
\u212A
\u03AC
\u1F71
\uAC00
\u1100\u1161
\uAC01
\u1100\u1161\u11A8
\uAC00\u11A8

View File

@ -0,0 +1,17 @@
/* { dg-do preprocess } */
/* { dg-options "-std=c99" } */
\u00AA
\u00AB /* { dg-error "not valid in an identifier" } */
\u00B6 /* { dg-error "not valid in an identifier" } */
\u00BA
\u00C0
\u00D6
\u0384
\u0669 /* { dg-error "not valid at the start of an identifier" } */
A\u0669
0\u00BA
0\u0669
\u0E59 /* { dg-error "not valid at the start of an identifier" } */
A\u0E59

View File

@ -0,0 +1,17 @@
/* { dg-do preprocess } */
/* { dg-options "-std=c99 -pedantic" } */
\u00AA
\u00AB /* { dg-error "not valid in an identifier" } */
\u00B6 /* { dg-error "not valid in an identifier" } */
\u00BA
\u00C0
\u00D6
\u0384 /* { dg-error "not valid in an identifier" } */
\u0669 /* { dg-error "not valid at the start of an identifier" } */
A\u0669
0\u00BA
0\u0669
\u0E59 /* { dg-error "not valid at the start of an identifier" } */
A\u0E59

View File

@ -1,3 +1,32 @@
2005-03-14 Geoffrey Keating <geoffk@apple.com>
* init.c (cpp_create_reader): Default warn_normalize to normalized_C.
* charset.c: Update for new format of ucnid.h.
(ucn_valid_in_identifier): Update for new format of ucnid.h.
Add NST parameter, and update it; update callers.
(cpp_valid_ucn): Add NST parameter, update callers. Replace abort
with cpp_error.
(convert_ucn): Pass normalize_state to cpp_valid_ucn.
* internal.h (struct normalize_state): New.
(INITIAL_NORMALIZE_STATE): New.
(NORMALIZE_STATE_RESULT): New.
(NORMALIZE_STATE_UPDATE_IDNUM): New.
(_cpp_valid_ucn): New.
* lex.c (warn_about_normalization): New.
(forms_identifier_p): Add normalize_state parameter, update callers.
(lex_identifier): Add normalize_state parameter, update callers. Keep
the state current.
(lex_number): Likewise.
(_cpp_lex_direct): Pass normalize_state to subroutines. Check
it with warn_about_normalization.
* makeucnid.c: New.
* ucnid.h: Replace.
* ucnid.pl: Remove.
* ucnid.tab: Make appropriate for input to makeucnid.c. Remove
comments about obsolete version of C++.
* include/cpplib.h (enum cpp_normalize_level): New.
(struct cpp_options): Add warn_normalize field.
2005-03-11 Geoffrey Keating <geoffk@apple.com>
* directives.c (glue_header_name): Update call to cpp_spell_token.

View File

@ -22,7 +22,6 @@ Foundation, 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */
#include "system.h"
#include "cpplib.h"
#include "internal.h"
#include "ucnid.h"
/* Character set handling for C-family languages.
@ -786,43 +785,128 @@ width_to_mask (size_t width)
return ((size_t) 1 << width) - 1;
}
/* A large table of unicode character information. */
enum {
/* Valid in a C99 identifier? */
C99 = 1,
/* Valid in a C99 identifier, but not as the first character? */
DIG = 2,
/* Valid in a C++ identifier? */
CXX = 4,
/* NFC representation is not valid in an identifier? */
CID = 8,
/* Might be valid NFC form? */
NFC = 16,
/* Might be valid NFKC form? */
NKC = 32,
/* Certain preceding characters might make it not valid NFC/NKFC form? */
CTX = 64
};
static const struct {
/* Bitmap of flags above. */
unsigned char flags;
/* Combining class of the character. */
unsigned char combine;
/* Last character in the range described by this entry. */
unsigned short end;
} ucnranges[] = {
#include "ucnid.h"
};
/* Returns 1 if C is valid in an identifier, 2 if C is valid except at
the start of an identifier, and 0 if C is not valid in an
identifier. We assume C has already gone through the checks of
_cpp_valid_ucn. The algorithm is a simple binary search on the
table defined in cppucnid.h. */
_cpp_valid_ucn. Also update NST for C if returning nonzero. The
algorithm is a simple binary search on the table defined in
ucnid.h. */
static int
ucn_valid_in_identifier (cpp_reader *pfile, cppchar_t c)
ucn_valid_in_identifier (cpp_reader *pfile, cppchar_t c,
struct normalize_state *nst)
{
int mn, mx, md;
mn = -1;
mx = ARRAY_SIZE (ucnranges);
while (mx - mn > 1)
if (c > 0xFFFF)
return 0;
mn = 0;
mx = ARRAY_SIZE (ucnranges) - 1;
while (mx != mn)
{
md = (mn + mx) / 2;
if (c < ucnranges[md].lo)
if (c <= ucnranges[md].end)
mx = md;
else if (c > ucnranges[md].hi)
mn = md;
else
goto found;
mn = md + 1;
}
return 0;
found:
/* When -pedantic, we require the character to have been listed by
the standard for the current language. Otherwise, we accept the
union of the acceptable sets for C++98 and C99. */
if (! (ucnranges[mn].flags & (C99 | CXX)))
return 0;
if (CPP_PEDANTIC (pfile)
&& ((CPP_OPTION (pfile, c99) && !(ucnranges[md].flags & C99))
&& ((CPP_OPTION (pfile, c99) && !(ucnranges[mn].flags & C99))
|| (CPP_OPTION (pfile, cplusplus)
&& !(ucnranges[md].flags & CXX))))
&& !(ucnranges[mn].flags & CXX))))
return 0;
/* Update NST. */
if (ucnranges[mn].combine != 0 && ucnranges[mn].combine < nst->prev_class)
nst->level = normalized_none;
else if (ucnranges[mn].flags & CTX)
{
bool safe;
cppchar_t p = nst->previous;
/* Easy cases from Bengali, Oriya, Tamil, Jannada, and Malayalam. */
if (c == 0x09BE)
safe = p != 0x09C7; /* Use 09CB instead of 09C7 09BE. */
else if (c == 0x0B3E)
safe = p != 0x0B47; /* Use 0B4B instead of 0B47 0B3E. */
else if (c == 0x0BBE)
safe = p != 0x0BC6 && p != 0x0BC7; /* Use 0BCA/0BCB instead. */
else if (c == 0x0CC2)
safe = p != 0x0CC6; /* Use 0CCA instead of 0CC6 0CC2. */
else if (c == 0x0D3E)
safe = p != 0x0D46 && p != 0x0D47; /* Use 0D4A/0D4B instead. */
/* For Hangul, characters in the range AC00-D7A3 are NFC/NFKC,
and are combined algorithmically from a sequence of the form
1100-1112 1161-1175 11A8-11C2
(if the third is not present, it is treated as 11A7, which is not
really a valid character).
Unfortunately, C99 allows (only) the NFC form, but C++ allows
only the combining characters. */
else if (c >= 0x1161 && c <= 0x1175)
safe = p < 0x1100 || p > 0x1112;
else if (c >= 0x11A8 && c <= 0x11C2)
safe = (p < 0xAC00 || p > 0xD7A3 || (p - 0xAC00) % 28 != 0);
else
{
/* Uh-oh, someone updated ucnid.h without updating this code. */
cpp_error (pfile, CPP_DL_ICE, "Character %x might not be NFKC", c);
safe = true;
}
if (!safe && c < 0x1161)
nst->level = normalized_none;
else if (!safe)
nst->level = MAX (nst->level, normalized_identifier_C);
}
else if (ucnranges[mn].flags & NKC)
;
else if (ucnranges[mn].flags & NFC)
nst->level = MAX (nst->level, normalized_C);
else if (ucnranges[mn].flags & CID)
nst->level = MAX (nst->level, normalized_identifier_C);
else
nst->level = normalized_none;
nst->previous = c;
nst->prev_class = ucnranges[mn].combine;
/* In C99, UCN digits may not begin identifiers. */
if (CPP_OPTION (pfile, c99) && (ucnranges[md].flags & DIG))
if (CPP_OPTION (pfile, c99) && (ucnranges[mn].flags & DIG))
return 2;
return 1;
@ -853,7 +937,8 @@ ucn_valid_in_identifier (cpp_reader *pfile, cppchar_t c)
cppchar_t
_cpp_valid_ucn (cpp_reader *pfile, const uchar **pstr,
const uchar *limit, int identifier_pos)
const uchar *limit, int identifier_pos,
struct normalize_state *nst)
{
cppchar_t result, c;
unsigned int length;
@ -873,7 +958,10 @@ _cpp_valid_ucn (cpp_reader *pfile, const uchar **pstr,
else if (str[-1] == 'U')
length = 8;
else
abort();
{
cpp_error (pfile, CPP_DL_ICE, "In _cpp_valid_ucn but not a UCN");
length = 4;
}
result = 0;
do
@ -915,10 +1003,11 @@ _cpp_valid_ucn (cpp_reader *pfile, const uchar **pstr,
CPP_OPTION (pfile, warn_dollars) = 0;
cpp_error (pfile, CPP_DL_PEDWARN, "'$' in identifier or number");
}
NORMALIZE_STATE_UPDATE_IDNUM (nst);
}
else if (identifier_pos)
{
int validity = ucn_valid_in_identifier (pfile, result);
int validity = ucn_valid_in_identifier (pfile, result, nst);
if (validity == 0)
cpp_error (pfile, CPP_DL_ERROR,
@ -950,9 +1039,10 @@ convert_ucn (cpp_reader *pfile, const uchar *from, const uchar *limit,
int rval;
struct cset_converter cvt
= wide ? pfile->wide_cset_desc : pfile->narrow_cset_desc;
struct normalize_state nst = INITIAL_NORMALIZE_STATE;
from++; /* Skip u/U. */
ucn = _cpp_valid_ucn (pfile, &from, limit, 0);
ucn = _cpp_valid_ucn (pfile, &from, limit, 0, &nst);
rval = one_cppchar_to_utf8 (ucn, &bufp, &bytesleft);
if (rval)

View File

@ -236,6 +236,19 @@ typedef CPPCHAR_SIGNED_T cppchar_signed_t;
/* Style of header dependencies to generate. */
enum cpp_deps_style { DEPS_NONE = 0, DEPS_USER, DEPS_SYSTEM };
/* The possible normalization levels, from most restrictive to least. */
enum cpp_normalize_level {
/* In NFKC. */
normalized_KC = 0,
/* In NFC. */
normalized_C,
/* In NFC, except for subsequences where being in NFC would make
the identifier invalid. */
normalized_identifier_C,
/* Not normalized at all. */
normalized_none
};
/* This structure is nested inside struct cpp_reader, and
carries all the options visible to the command line. */
struct cpp_options
@ -373,6 +386,10 @@ struct cpp_options
/* Holds the name of the input character set. */
const char *input_charset;
/* The minimum permitted level of normalization before a warning
is generated. */
enum cpp_normalize_level warn_normalize;
/* True to warn about precompiled header files we couldn't use. */
bool warn_invalid_pch;

View File

@ -153,6 +153,7 @@ cpp_create_reader (enum c_lang lang, hash_table *table,
CPP_OPTION (pfile, dollars_in_ident) = 1;
CPP_OPTION (pfile, warn_dollars) = 1;
CPP_OPTION (pfile, warn_variadic_macros) = 1;
CPP_OPTION (pfile, warn_normalize) = normalized_C;
/* Default CPP arithmetic to something sensible for the host for the
benefit of dumb users like fix-header. */

View File

@ -564,8 +564,31 @@ extern unsigned char *_cpp_copy_replacement_text (const cpp_macro *,
extern size_t _cpp_replacement_text_len (const cpp_macro *);
/* In charset.c. */
/* The normalization state at this point in the sequence.
It starts initialized to all zeros, and at the end
'level' is the normalization level of the sequence. */
struct normalize_state
{
/* The previous character. */
cppchar_t previous;
/* The combining class of the previous character. */
unsigned char prev_class;
/* The lowest normalization level so far. */
enum cpp_normalize_level level;
};
#define INITIAL_NORMALIZE_STATE { 0, 0, normalized_KC }
#define NORMALIZE_STATE_RESULT(st) ((st)->level)
/* We saw a character that matches ISIDNUM(), update a
normalize_state appropriately. */
#define NORMALIZE_STATE_UPDATE_IDNUM(st) \
((st)->previous = 0, (st)->prev_class = 0)
extern cppchar_t _cpp_valid_ucn (cpp_reader *, const unsigned char **,
const unsigned char *, int);
const unsigned char *, int,
struct normalize_state *state);
extern void _cpp_destroy_iconv (cpp_reader *);
extern unsigned char *_cpp_convert_input (cpp_reader *, const char *,
unsigned char *, size_t, size_t,

View File

@ -53,9 +53,6 @@ static const struct token_spelling token_spellings[N_TTYPES] = { TTYPE_TABLE };
static void add_line_note (cpp_buffer *, const uchar *, unsigned int);
static int skip_line_comment (cpp_reader *);
static void skip_whitespace (cpp_reader *, cppchar_t);
static cpp_hashnode *lex_identifier (cpp_reader *, const uchar *, bool);
static void lex_number (cpp_reader *, cpp_string *);
static bool forms_identifier_p (cpp_reader *, int);
static void lex_string (cpp_reader *, cpp_token *, const uchar *);
static void save_comment (cpp_reader *, cpp_token *, const uchar *, cppchar_t);
static void create_literal (cpp_reader *, cpp_token *, const uchar *,
@ -430,10 +427,36 @@ name_p (cpp_reader *pfile, const cpp_string *string)
return 1;
}
/* After parsing an identifier or other sequence, produce a warning about
sequences not in NFC/NFKC. */
static void
warn_about_normalization (cpp_reader *pfile,
const cpp_token *token,
const struct normalize_state *s)
{
if (CPP_OPTION (pfile, warn_normalize) < NORMALIZE_STATE_RESULT (s)
&& !pfile->state.skipping)
{
/* Make sure that the token is printed using UCNs, even
if we'd otherwise happily print UTF-8. */
unsigned char *buf = xmalloc (cpp_token_len (token));
size_t sz;
sz = cpp_spell_token (pfile, token, buf, false) - buf;
if (NORMALIZE_STATE_RESULT (s) == normalized_C)
cpp_error_with_line (pfile, CPP_DL_WARNING, token->src_loc, 0,
"`%.*s' is not in NFKC", sz, buf);
else
cpp_error_with_line (pfile, CPP_DL_WARNING, token->src_loc, 0,
"`%.*s' is not in NFC", sz, buf);
}
}
/* Returns TRUE if the sequence starting at buffer->cur is invalid in
an identifier. FIRST is TRUE if this starts an identifier. */
static bool
forms_identifier_p (cpp_reader *pfile, int first)
forms_identifier_p (cpp_reader *pfile, int first,
struct normalize_state *state)
{
cpp_buffer *buffer = pfile->buffer;
@ -457,7 +480,8 @@ forms_identifier_p (cpp_reader *pfile, int first)
&& (buffer->cur[1] == 'u' || buffer->cur[1] == 'U'))
{
buffer->cur += 2;
if (_cpp_valid_ucn (pfile, &buffer->cur, buffer->rlimit, 1 + !first))
if (_cpp_valid_ucn (pfile, &buffer->cur, buffer->rlimit, 1 + !first,
state))
return true;
buffer->cur -= 2;
}
@ -467,7 +491,8 @@ forms_identifier_p (cpp_reader *pfile, int first)
/* Lex an identifier starting at BUFFER->CUR - 1. */
static cpp_hashnode *
lex_identifier (cpp_reader *pfile, const uchar *base, bool starts_ucn)
lex_identifier (cpp_reader *pfile, const uchar *base, bool starts_ucn,
struct normalize_state *nst)
{
cpp_hashnode *result;
const uchar *cur;
@ -482,13 +507,16 @@ lex_identifier (cpp_reader *pfile, const uchar *base, bool starts_ucn)
cur++;
}
pfile->buffer->cur = cur;
if (starts_ucn || forms_identifier_p (pfile, false))
if (starts_ucn || forms_identifier_p (pfile, false, nst))
{
/* Slower version for identifiers containing UCNs (or $). */
do {
while (ISIDNUM (*pfile->buffer->cur))
pfile->buffer->cur++;
} while (forms_identifier_p (pfile, false));
{
pfile->buffer->cur++;
NORMALIZE_STATE_UPDATE_IDNUM (nst);
}
} while (forms_identifier_p (pfile, false, nst));
result = _cpp_interpret_identifier (pfile, base,
pfile->buffer->cur - base);
}
@ -524,7 +552,8 @@ lex_identifier (cpp_reader *pfile, const uchar *base, bool starts_ucn)
/* Lex a number to NUMBER starting at BUFFER->CUR - 1. */
static void
lex_number (cpp_reader *pfile, cpp_string *number)
lex_number (cpp_reader *pfile, cpp_string *number,
struct normalize_state *nst)
{
const uchar *cur;
const uchar *base;
@ -537,11 +566,14 @@ lex_number (cpp_reader *pfile, cpp_string *number)
/* N.B. ISIDNUM does not include $. */
while (ISIDNUM (*cur) || *cur == '.' || VALID_SIGN (*cur, cur[-1]))
cur++;
{
cur++;
NORMALIZE_STATE_UPDATE_IDNUM (nst);
}
pfile->buffer->cur = cur;
}
while (forms_identifier_p (pfile, false));
while (forms_identifier_p (pfile, false, nst));
number->len = cur - base;
dest = _cpp_unaligned_alloc (pfile, number->len + 1);
@ -897,9 +929,13 @@ _cpp_lex_direct (cpp_reader *pfile)
case '0': case '1': case '2': case '3': case '4':
case '5': case '6': case '7': case '8': case '9':
result->type = CPP_NUMBER;
lex_number (pfile, &result->val.str);
break;
{
struct normalize_state nst = INITIAL_NORMALIZE_STATE;
result->type = CPP_NUMBER;
lex_number (pfile, &result->val.str, &nst);
warn_about_normalization (pfile, result, &nst);
break;
}
case 'L':
/* 'L' may introduce wide characters or strings. */
@ -922,7 +958,12 @@ _cpp_lex_direct (cpp_reader *pfile)
case 'S': case 'T': case 'U': case 'V': case 'W': case 'X':
case 'Y': case 'Z':
result->type = CPP_NAME;
result->val.node = lex_identifier (pfile, buffer->cur - 1, false);
{
struct normalize_state nst = INITIAL_NORMALIZE_STATE;
result->val.node = lex_identifier (pfile, buffer->cur - 1, false,
&nst);
warn_about_normalization (pfile, result, &nst);
}
/* Convert named operators to their proper types. */
if (result->val.node->flags & NODE_OPERATOR)
@ -1067,8 +1108,10 @@ _cpp_lex_direct (cpp_reader *pfile)
result->type = CPP_DOT;
if (ISDIGIT (*buffer->cur))
{
struct normalize_state nst = INITIAL_NORMALIZE_STATE;
result->type = CPP_NUMBER;
lex_number (pfile, &result->val.str);
lex_number (pfile, &result->val.str, &nst);
warn_about_normalization (pfile, result, &nst);
}
else if (*buffer->cur == '.' && buffer->cur[1] == '.')
buffer->cur += 2, result->type = CPP_ELLIPSIS;
@ -1151,11 +1194,13 @@ _cpp_lex_direct (cpp_reader *pfile)
case '\\':
{
const uchar *base = --buffer->cur;
struct normalize_state nst = INITIAL_NORMALIZE_STATE;
if (forms_identifier_p (pfile, true))
if (forms_identifier_p (pfile, true, &nst))
{
result->type = CPP_NAME;
result->val.node = lex_identifier (pfile, base, true);
result->val.node = lex_identifier (pfile, base, true, &nst);
warn_about_normalization (pfile, result, &nst);
break;
}
buffer->cur++;

342
libcpp/makeucnid.c Normal file
View File

@ -0,0 +1,342 @@
/* Make ucnid.h from various sources.
Copyright (C) 2005 Free Software Foundation, Inc.
This program is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 2, or (at your option) any
later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */
/* Run this program as
./makeucnid ucnid.tab UnicodeData.txt DerivedNormalizationProps.txt \
> ucnid.h
*/
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#include <stdbool.h>
#include <stdlib.h>
enum {
C99 = 1,
CXX = 2,
digit = 4,
not_NFC = 8,
not_NFKC = 16,
maybe_not_NFC = 32
};
static unsigned flags[65536];
static unsigned short decomp[65536][2];
static unsigned char combining_value[65536];
/* Die! */
static void
fail (const char *s)
{
fprintf (stderr, "%s\n", s);
exit (1);
}
/* Read ucnid.tab and set the C99 and CXX flags in header[]. */
static void
read_ucnid (const char *fname)
{
FILE *f = fopen (fname, "r");
unsigned fl = 0;
if (!f)
fail ("opening ucnid.tab");
for (;;)
{
char line[256];
if (!fgets (line, sizeof (line), f))
break;
if (strcmp (line, "[C99]\n") == 0)
fl = C99;
else if (strcmp (line, "[CXX]\n") == 0)
fl = CXX;
else if (isxdigit (line[0]))
{
char *l = line;
while (*l)
{
unsigned long start, end;
char *endptr;
start = strtoul (l, &endptr, 16);
if (endptr == l || (*endptr != '-' && ! isspace (*endptr)))
fail ("parsing ucnid.tab [1]");
l = endptr;
if (*l != '-')
end = start;
else
{
end = strtoul (l + 1, &endptr, 16);
if (end < start)
fail ("parsing ucnid.tab, end before start");
l = endptr;
if (! isspace (*l))
fail ("parsing ucnid.tab, junk after range");
}
while (isspace (*l))
l++;
if (end > 0xFFFF)
fail ("parsing ucnid.tab, end too large");
while (start <= end)
flags[start++] |= fl;
}
}
}
if (ferror (f))
fail ("reading ucnid.tab");
fclose (f);
}
/* Read UnicodeData.txt and set the 'digit' flag, and
also fill in the 'decomp' table to be the decompositions of
characters for which both the character decomposed and all the code
points in the decomposition are either C99 or CXX. */
static void
read_table (char *fname)
{
FILE * f = fopen (fname, "r");
if (!f)
fail ("opening UnicodeData.txt");
for (;;)
{
char line[256];
unsigned long codepoint, this_decomp[4];
char *l;
int i;
int decomp_useful;
if (!fgets (line, sizeof (line), f))
break;
codepoint = strtoul (line, &l, 16);
if (l == line || *l != ';')
fail ("parsing UnicodeData.txt, reading code point");
if (codepoint > 0xffff || ! (flags[codepoint] & (C99 | CXX)))
continue;
do {
l++;
} while (*l != ';');
/* Category value; things starting with 'N' are numbers of some
kind. */
if (*++l == 'N')
flags[codepoint] |= digit;
do {
l++;
} while (*l != ';');
/* Canonical combining class; in NFC/NFKC, they must be increasing
(or zero). */
if (! isdigit (*++l))
fail ("parsing UnicodeData.txt, combining class not number");
combining_value[codepoint] = strtoul (l, &l, 10);
if (*l++ != ';')
fail ("parsing UnicodeData.txt, junk after combining class");
/* Skip over bidi value. */
do {
l++;
} while (*l != ';');
/* Decomposition mapping. */
decomp_useful = flags[codepoint];
if (*++l == '<') /* Compatibility mapping. */
continue;
for (i = 0; i < 4; i++)
{
if (*l == ';')
break;
if (!isxdigit (*l))
fail ("parsing UnicodeData.txt, decomposition format");
this_decomp[i] = strtoul (l, &l, 16);
decomp_useful &= flags[this_decomp[i]];
while (isspace (*l))
l++;
}
if (i > 2) /* Decomposition too long. */
fail ("parsing UnicodeData.txt, decomposition too long");
if (decomp_useful)
while (--i >= 0)
decomp[codepoint][i] = this_decomp[i];
}
if (ferror (f))
fail ("reading UnicodeData.txt");
fclose (f);
}
/* Read DerivedNormalizationProps.txt and set the flags that say whether
a character is in NFC, NFKC, or is context-dependent. */
static void
read_derived (const char *fname)
{
FILE * f = fopen (fname, "r");
if (!f)
fail ("opening DerivedNormalizationProps.txt");
for (;;)
{
char line[256];
unsigned long start, end;
char *l;
bool not_NFC_p, not_NFKC_p, maybe_not_NFC_p;
if (!fgets (line, sizeof (line), f))
break;
not_NFC_p = (strstr (line, "; NFC_QC; N") != NULL);
not_NFKC_p = (strstr (line, "; NFKC_QC; N") != NULL);
maybe_not_NFC_p = (strstr (line, "; NFC_QC; M") != NULL);
if (! not_NFC_p && ! not_NFKC_p && ! maybe_not_NFC_p)
continue;
start = strtoul (line, &l, 16);
if (l == line)
fail ("parsing DerivedNormalizationProps.txt, reading start");
if (start > 0xffff)
continue;
if (*l == '.' && l[1] == '.')
end = strtoul (l + 2, &l, 16);
else
end = start;
while (start <= end)
flags[start++] |= ((not_NFC_p ? not_NFC : 0)
| (not_NFKC_p ? not_NFKC : 0)
| (maybe_not_NFC_p ? maybe_not_NFC : 0)
);
}
if (ferror (f))
fail ("reading DerivedNormalizationProps.txt");
fclose (f);
}
/* Write out the table.
The table consists of two words per entry. The first word is the flags
for the unicode code points up to and including the second word. */
static void
write_table (void)
{
unsigned i;
unsigned last_flag = flags[0];
bool really_safe = decomp[0][0] == 0;
unsigned char last_combine = combining_value[0];
for (i = 1; i <= 65536; i++)
if (i == 65536
|| (flags[i] != last_flag && ((flags[i] | last_flag) & (C99 | CXX)))
|| really_safe != (decomp[i][0] == 0)
|| combining_value[i] != last_combine)
{
printf ("{ %s|%s|%s|%s|%s|%s|%s, %3d, %#06x },\n",
last_flag & C99 ? "C99" : " 0",
last_flag & digit ? "DIG" : " 0",
last_flag & CXX ? "CXX" : " 0",
really_safe ? "CID" : " 0",
last_flag & not_NFC ? " 0" : "NFC",
last_flag & not_NFKC ? " 0" : "NKC",
last_flag & maybe_not_NFC ? "CTX" : " 0",
combining_value[i - 1],
i - 1);
last_flag = flags[i];
last_combine = combining_value[0];
really_safe = decomp[i][0] == 0;
}
}
/* Print out the huge copyright notice. */
static void
write_copyright (void)
{
static const char copyright[] = "\
/* Unicode characters and various properties.\n\
Copyright (C) 2003, 2005 Free Software Foundation, Inc.\n\
\n\
This program is free software; you can redistribute it and/or modify it\n\
under the terms of the GNU General Public License as published by the\n\
Free Software Foundation; either version 2, or (at your option) any\n\
later version.\n\
\n\
This program is distributed in the hope that it will be useful,\n\
but WITHOUT ANY WARRANTY; without even the implied warranty of\n\
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the\n\
GNU General Public License for more details.\n\
\n\
You should have received a copy of the GNU General Public License\n\
along with this program; if not, write to the Free Software\n\
Foundation, 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.\n\
\n\
\n\
Copyright (C) 1991-2005 Unicode, Inc. All rights reserved.\n\
Distributed under the Terms of Use in\n\
http://www.unicode.org/copyright.html.\n\
\n\
Permission is hereby granted, free of charge, to any person\n\
obtaining a copy of the Unicode data files and any associated\n\
documentation (the \"Data Files\") or Unicode software and any\n\
associated documentation (the \"Software\") to deal in the Data Files\n\
or Software without restriction, including without limitation the\n\
rights to use, copy, modify, merge, publish, distribute, and/or\n\
sell copies of the Data Files or Software, and to permit persons to\n\
whom the Data Files or Software are furnished to do so, provided\n\
that (a) the above copyright notice(s) and this permission notice\n\
appear with all copies of the Data Files or Software, (b) both the\n\
above copyright notice(s) and this permission notice appear in\n\
associated documentation, and (c) there is clear notice in each\n\
modified Data File or in the Software as well as in the\n\
documentation associated with the Data File(s) or Software that the\n\
data or software has been modified.\n\
\n\
THE DATA FILES AND SOFTWARE ARE PROVIDED \"AS IS\", WITHOUT WARRANTY\n\
OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE\n\
WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND\n\
NONINFRINGEMENT OF THIRD PARTY RIGHTS. IN NO EVENT SHALL THE\n\
COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS NOTICE BE LIABLE FOR\n\
ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY\n\
DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,\n\
WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS\n\
ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE\n\
OF THE DATA FILES OR SOFTWARE.\n\
\n\
Except as contained in this notice, the name of a copyright holder\n\
shall not be used in advertising or otherwise to promote the sale,\n\
use or other dealings in these Data Files or Software without prior\n\
written authorization of the copyright holder. */\n";
puts (copyright);
}
/* Main program. */
int
main(int argc, char ** argv)
{
if (argc != 4)
fail ("too few arguments to makeucn");
read_ucnid (argv[1]);
read_table (argv[2]);
read_derived (argv[3]);
write_copyright ();
write_table ();
return 0;
}

File diff suppressed because it is too large Load Diff

View File

@ -1,130 +0,0 @@
#! /usr/bin/perl -w
use strict;
# Convert cppucnid.tab to cppucnid.h. We use two arrays of length
# 65536 to represent the table, since this is nice and simple. The
# first array holds the tags indicating which ranges are valid in
# which contexts. The second array holds the language name associated
# with each element.
our(@tags, @names);
@tags = ("") x 65536;
@names = ("") x 65536;
# Array mapping tag numbers to standard #defines
our @stds;
# Current standard and language
our($curstd, $curlang);
# First block of the file is a template to be saved for later.
our @template;
while (<>) {
chomp;
last if $_ eq '%%';
push @template, $_;
};
# Second block of the file is the UCN tables.
# The format looks like this:
#
# [std]
#
# ; language
# xxxx-xxxx xxxx xxxx-xxxx ....
#
# with comment lines starting with #.
while (<>) {
chomp;
/^#/ and next;
/^\s*$/ and next;
/^\[(.+)\]$/ and do {
$curstd = $1;
next;
};
/^; (.+)$/ and do {
$curlang = $1;
next;
};
process_range(split);
}
# Print out the template, inserting as requested.
$\ = "\n";
for (@template) {
print("/* Automatically generated from cppucnid.tab, do not edit */"),
next if $_ eq "[dne]";
print_table(), next if $_ eq "[table]";
print;
}
sub print_table {
my($lo, $hi);
my $prevname = "";
for ($lo = 0; $lo <= $#tags; $lo = $hi) {
$hi = $lo;
$hi++ while $hi <= $#tags
&& $tags[$hi] eq $tags[$lo]
&& $names[$hi] eq $names[$lo];
# Range from $lo to $hi-1.
# Don't make entries for ranges that are not valid idchars.
next if ($tags[$lo] eq "");
my $tag = $tags[$lo];
$tag = " ".$tag if $tag =~ /^C99/;
if ($names[$lo] eq $prevname) {
printf(" { 0x%04x, 0x%04x, %-11s },\n",
$lo, $hi-1, $tag);
} else {
printf(" { 0x%04x, 0x%04x, %-11s }, /* %s */\n",
$lo, $hi-1, $tag, $names[$lo]);
}
$prevname = $names[$lo];
}
}
# The line is a list of four-digit hexadecimal numbers or
# pairs of such numbers. Each is a valid identifier character
# from the given language, under the given standard.
sub process_range {
for my $range (@_) {
if ($range =~ /^[0-9a-f]{4}$/) {
my $i = hex($range);
if ($tags[$i] eq "") {
$tags[$i] = $curstd;
} else {
$tags[$i] = $curstd . "|" . $tags[$i];
}
if ($names[$i] ne "" && $names[$i] ne $curlang) {
warn sprintf ("language overlap: %s/%s at %x (tag %d)",
$names[$i], $curlang, $i, $tags[$i]);
next;
}
$names[$i] = $curlang;
} elsif ($range =~ /^ ([0-9a-f]{4}) - ([0-9a-f]{4}) $/x) {
my ($start, $end) = (hex($1), hex($2));
my $i;
for ($i = $start; $i <= $end; $i++) {
if ($tags[$i] eq "") {
$tags[$i] = $curstd;
} else {
$tags[$i] = $curstd . "|" . $tags[$i];
}
if ($names[$i] ne "" && $names[$i] ne $curlang) {
warn sprintf ("language overlap: %s/%s at %x (tag %d)",
$names[$i], $curlang, $i, $tags[$i]);
next;
}
$names[$i] = $curlang;
}
} else {
warn "malformed range expression $range";
}
}
}

View File

@ -1,47 +1,25 @@
/* Table of UCNs which are valid in identifiers.
Copyright (C) 2003 Free Software Foundation, Inc.
This program is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 2, or (at your option) any
later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */
[dne]
/* This file reproduces the table in ISO/IEC 9899:1999 (C99) Annex
D, which is itself a reproduction from ISO/IEC TR 10176:1998, and
the similar table from ISO/IEC 14882:1988 (C++98) Annex E, which is
a reproduction of ISO/IEC PDTR 10176. Unfortunately these tables
are not identical. */
#ifndef LIBCPP_UCNID_H
#define LIBCPP_UCNID_H
#define C99 1
#define CXX 2
#define DIG 4
struct ucnrange
{
unsigned short lo, hi;
unsigned short flags;
};
static const struct ucnrange ucnranges[] = {
[table]
};
#endif /* LIBCPP_UCNID_H */
%%
; Table of UCNs which are valid in identifiers.
; Copyright (C) 2003, 2005 Free Software Foundation, Inc.
;
; This program is free software; you can redistribute it and/or modify it
; under the terms of the GNU General Public License as published by the
; Free Software Foundation; either version 2, or (at your option) any
; later version.
;
; This program is distributed in the hope that it will be useful,
; but WITHOUT ANY WARRANTY; without even the implied warranty of
; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
; GNU General Public License for more details.
;
; You should have received a copy of the GNU General Public License
; along with this program; if not, write to the Free Software
; Foundation, 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
;
; This file reproduces the table in ISO/IEC 9899:1999 (C99) Annex
; D, which is itself a reproduction from ISO/IEC TR 10176:1998, and
; the similar table from ISO/IEC 14882:1988 (C++98) Annex E, which is
; a reproduction of ISO/IEC PDTR 10176. Unfortunately these tables
; are not identical.
[C99]
@ -141,7 +119,6 @@ ac00-d7a3
0b3d 1fbe 203f-2040 2102 2107 210a-2113 2115 2118-211d 2124 2126 2128
212a-2131 2133-2138 2160-2182 3005-3007 3021-3029
[C99|DIG]
; Digits
0660-0669 06f0-06f9 0966-096f 09e6-09ef 0a66-0a6f 0ae6-0aef 0b66-0b6f
0be7-0bef 0c66-0c6f 0ce6-0cef 0d66-0d6f 0e50-0e59 0ed0-0ed9 0f20-0f33
@ -201,16 +178,12 @@ ac00-d7a3
; Malayalam
0d05-0d0c 0d0e-0d10 0d12-0d28 0d2a-0d39 0d60-0d61
# CORRECTION: Exclude 0e50-0e59 from the Thai range and make a fake
# Digits range for it, to match C99. cppcharset.c knows that C++
# doesn't distinguish digits from other UCNs valid in identifiers.
; Thai
0e01-0e30 0e32-0e33 0e40-0e46 0e4f-0e49 0e5a-0e5b
0e01-0e30 0e32-0e33 0e40-0e46 0e4f-0e5b
; Digits
0e50-0e59
# CORRECTION: Change 0e0d to 0e8d (typo in standard; see C++ DR 131)
; Lao
0e81-0e82 0e84 0e87-0e88 0e8a 0e8d 0e94-0e97 0e99-0e9f 0ea1-0ea3 0ea5
0ea7 0eaa-0eab 0ead-0eb0 0eb2 0eb3 0ebd 0ec0-0ec4 0ec6
@ -224,7 +197,6 @@ ac00-d7a3
; Katakana
30a1-30fe
# CORRECTION: language spelled "Bopmofo" in C++98.
; Bopomofo
3105-312c