mirror of
https://github.com/gcc-mirror/gcc.git
synced 2024-11-21 13:40:47 +00:00
eba6d2aa71
The following patch implements the C23 N3017 "#embed - a scannable, tooling-friendly binary resource inclusion mechanism" paper. The implementation is intentionally dumb, in that it doesn't significantly speed up compilation of larger initializers and doesn't make it possible to use huge #embeds (like several gigabytes large, that is compile time and memory still infeasible). There are 2 reasons for this. One is that I think like it is implemented now in the patch is how we should use it for the smaller #embed sizes, dunno with which boundary, whether 32 bytes or 64 or something like that, certainly handling the single byte cases which is something that can appear anywhere in the source where constant integer literal can appear is desirable and I think for a few bytes it isn't worth it to come up with something smarter and users would like to e.g. see it in -E readably as well (perhaps the slow vs. fast boundary should be determined by command line option). And the other one is to be able to more easily find regressions in behavior caused by the optimizations, so we have something to get back in git to compare against. I'm definitely willing to work on the optimizations (likely introduce a new CPP_* token type to refer to a range of libcpp owned memory (start + size) and similarly some tree which can do the same, and can be at any time e.g. split into 2 subparts + say INTEGER_CST in between if needed say for const unsigned char d[] = { #embed "2GB.dat" prefix (0, 0, ) suffix (, [0x40000000] = 42) }; still without having to copy around huge amounts of data; STRING_CST owns the memory it points to and can be only 2GB in size), but would like to do that incrementally. And would like to first include some extensions also not included in this patch, like gnu::offset (off) parameter to allow to skip certain constant amount of bytes at the start of the files, plus gnu::base64 ("base64_encoded_data") parameter to add something which can store more efficiently large amounts of the #embed data in preprocessed source. I've been cross-checking all the tests also against the LLVM implementation https://github.com/llvm/llvm-project/pull/68620 which has been for a few hours even committed to LLVM trunk but reverted afterwards. LLVM now has the support committed and I admit I haven't rechecked whether the behavior on the below mentioned spots have been fixed in it already or not yet. The patch uses --embed-dir= option that clang plans to add above and doesn't use other variants on the search directories yet, plus there are no default directories at least for the time being where to search for embed files. So, #embed "..." works if it is found in the same directory (or relative to the current file's directory) and #embed "/..." or #embed </...> work always, but relative #embed <...> doesn't unless at least one --embed-dir= is specified. There is no reason to differentiate between system and non-system directories, so we don't need -isystem like counterpart, perhaps -iquote like counterpart could be useful in the future, dunno what else. It has --embed-directory=dir and --embed-directory dir as aliases. There are some differences beyond clang ICEs, so I'd like to point them out to make sure there is agreement on the choices in the patch. They are also mentioned in the comments of the llvm pull request. The most important is that the GCC patch (as well as the original thephd.dev LLVM branch on godbolt) expands #embed (or acts as if it is expanded) into a mere sequence of numbers like 123,2,35,26 rather then what clang effectively treats as (unsigned char)123,(unsigned char)2,(unsigned char)35,(unsigned char)26 but only does that when using integrated preprocessor, not when using -save-temps where it acts as GCC. JeanHeyd as the original author agrees that is how it is currently worded in C23. Another difference (not tested in the testsuite, not sure how to check for effective target /dev/urandom nor am sure it is desirable to check that during testsuite) is how to treat character devices, named pipes etc. (block devices are errored on). The original paper uses /dev/urandom in various examples and seems to assume that unlike regular files the devices aren't really cached, so #embed </dev/urandom> limit(1) prefix(int a = ) suffix(;) #embed </dev/urandom> limit(1) prefix(int b = ) suffix(;) usually results in a != b. That is what the godbolt thephd.dev branch implements too and what this patch does as well, but clang actually seems to just go from st.st_size == 0, ergo it must be zero-sized resource and so just copies over if_empty if present. It is really questionable what to do about the character devices/named pipes with __has_embed, for regular files the patch doesn't read anything from them, relies on st.st_size + limit for whether it is empty or non-empty. But I don't know of a way to check if read on say a character device would read anything or not (the </dev/null> limit (1) vs. </dev/zero> limit (1) cases), and if we read something, that would be better cached for later because #embed later if it reads again could read no further data even when it first read something. So, the patch currently for __has_embed just always returns 2 on the non-regular files, like the thephd.dev branch does as well and like the clang pull request as well. A question is also what to do for gnu::offset on the non-regular files even for #embed, those aren't seekable and do we want to just read and throw away the offset bytes each time we see it used? clang also chokes on the #if __has_embed (__FILE__ __limit__ (1) __prefix__ () suffix (1 / 0) \ __if_empty__ ((({{[0[0{0{0(0(0)1)1}1}]]}})))) != __STDC_EMBED_FOUND__ #error "__has_embed fail" #endif in embed-1.c, but thephd.dev branch accepts it and I don't see why it shouldn't, (({{[0[0{0{0(0(0)1)1}1}]]}}))) is a balanced token sequence and the file isn't empty, so it should just be parsed and discarded. clang also IMHO mishandles const unsigned char w[] = { #embed __FILE__ prefix([0] = 42, [15] =) limit(32) }; but again only without -save-temps, seems like it treats it as [0] = 42, [15] = (99,111,110,115,116,32,117,110,115,105,103,110,101,100, 32,99,104,97,114,32,119,91,93,32,61,32,123,10,35,101,109,98) rather than [0] = 42, [15] = 99,111,110,115,116,32,117,110,115,105,103,110,101,100, 32,99,104,97,114,32,119,91,93,32,61,32,123,10,35,101,109,98 and warns on it for -Wunused-value and just compiles it as [0] = 42, [15] = 98 And also void foo (int, int, int, int); void bar (void) { foo ( #embed __FILE__ limit (4) prefix (172 + ) suffix (+ 2) ); } is treated as 172 + (118, 111, 105, 100) + 2 rather than 172 + 118, 111, 105, 100 + 2 which clang -save-temps or GCC treats it like, so results in just one argument passed rather than 4. if (!strstr ((const char *) magna_carta, "imprisonétur")) abort (); in the testcase fails as well, but in that case calling it in gdb succeeds: p ((char *(*)(char *, char *))__strstr_sse2) (magna_carta, "imprisonétur") $2 = 0x555555558d3c <magna_carta+11564> "imprisonétur aut disseisiátur"... so I guess they are just trying to constant evaluate strstr and do it incorrectly. They started with making the optimizations together in the initial patch set, so they don't have the luxury to compare if it is just because of the optimization they are trying to do or because that is how the feature works for them. At least unless they use -save-temps for now. There is also different behavior between clang and gcc on -M or other dependency generating options. Seems clang includes the __has_embed searched files in dependencies, while my patch doesn't. But so does clang for __has_include and GCC doesn't. Emitting a hard dependency on some header just because there was __has_include/__has_embed for it seems wrong to me, because (at least when properly written) the source likely doesn't mind if the file is missing, it will do something else, so a hard error from make because of it doesn't seem right. Does make have some weaker dependencies, such that if some file can be remade it is but if it doesn't exist, it isn't fatal? I wonder whether #embed <non-existent-file> really needs to be fatal or whether we could simply after diagnosing it pretend the file exists and is empty. For #include I think fatal errors make tons of sense, but perhaps for #embed which is more localized we'd get better error reporting if we didn't bail out immediately. Note, both GCC and clang currently treat those as fatal errors. clang also added -dE option which with -E instead of preprocessing the #embed directives keeps them as is, but the preprocessed source then isn't self-contained. That option looks more harmful than useful to me. Also, it isn't clear to me from C23 whether it is possible to have __has_include/__has_c_attribute/__has_embed expressions inside of the limit #embed/__has_embed argument. 6.10.3.2/2 says that defined should not appear there (and the patch diagnoses it and testsuite tests), but for __has_include/__has_embed etc. 6.10.1/11 says: "The identifiers __has_include, __has_embed, and __has_c_attribute shall not appear in any context not mentioned in this subclause." If that subclause in that case means 6.10.1, then it presumably shouldn't appear in #embed in 6.10.3, but __has_embed is in 6.10.1... But 6.10.3.2/3 says that it should be parsed according to the 6.10.1 rules. Haven't included tests like #if __has_embed (__FILE__ limit (__has_embed (__FILE__ limit (1)))) or #embed __FILE__ limit (__has_include (__FILE__)) into the testsuite because of the doubts but I think the patch should handle those right now. The reason I've used Magna Carta text in some of the testcases is that I hope it shouldn't be copyrighted after the centuries and I'd strongly prefer not to have binary blobs in git after the xz backdoor lesson and wanted something larger which doesn't change all the time. Oh, BTW, I see in C23 draft 6.10.3.2 in Example 4 if (f_source == NULL); return 1; (note the spurious semicolon after closing paren), has that been fixed already? Like the thephd.dev and clang implementations, the patch always macro expands the whole #embed and __has_embed directives except for the embed keyword. That is most likely not what C23 says, my limited understanding right now is that in #embed one needs to parse the whole directive line with macro expansion disabled and check if it satisfies the grammar, if not, the whole directive is macro expanded, if yes, only the limit parameter argument is macro expanded and the prefix/suffix/if_empty arguments are maybe macro expanded when actually used (and not at all if unused). And I think __has_embed macro expansion has conflicting rules. 2024-09-12 Jakub Jelinek <jakub@redhat.com> PR c/105863 libcpp/ * include/cpplib.h: Implement C23 N3017 #embed - a scannable, tooling-friendly binary resource inclusion mechanism paper. (struct cpp_options): Add embed member. (enum cpp_builtin_type): Add BT_HAS_EMBED. (cpp_set_include_chains): Add another cpp_dir * argument to the declaration. * internal.h (enum include_type): Add IT_EMBED. (struct cpp_reader): Add embed_include member. (struct cpp_embed_params_tokens): New type. (struct cpp_embed_params): New type. (_cpp_get_token_no_padding): Declare. (enum _cpp_find_file_kind): Add _cpp_FFK_EMBED and _cpp_FFK_HAS_EMBED. (_cpp_stack_embed): Declare. (_cpp_parse_expr): Change return type to cpp_num_part instead of bool, change second argument from bool to const char * and add third argument. (_cpp_parse_embed_params): Declare. * directives.cc (DIRECTIVE_TABLE): Add embed entry. (end_directive): Don't call skip_rest_of_line for T_EMBED directive. (_cpp_handle_directive): Return 2 rather than 1 for T_EMBED in directives-only mode. (parse_include): Don't Call check_eol for T_EMBED directive. (skip_balanced_token_seq): New function. (EMBED_PARAMS): Define. (enum embed_param_kind): New type. (embed_params): New variable. (_cpp_parse_embed_params): New function. (do_embed): New function. (do_if): Adjust _cpp_parse_expr caller. (do_elif): Likewise. * expr.cc (parse_defined): Diagnose defined in #embed or __has_embed parameters. (_cpp_parse_expr): Change return type to cpp_num_part instead of bool, change second argument from bool to const char * and add third argument. Adjust function comment. For #embed/__has_embed parameters add an artificial CPP_OPEN_PAREN. Use the second argument DIR directly instead of string literals conditional on IS_IF. For #embed/__has_embed parameter, stop on reaching CPP_CLOSE_PAREN matching the artificial one. Diagnose negative or too large embed parameter operands. (num_binary_op): Use #embed instead of #if for diagnostics if inside #embed/__has_embed parameter. (num_div_op): Likewise. * files.cc (struct _cpp_file): Add limit member and embed bitfield. (search_cache): Add IS_EMBED argument, formatting fix. Skip over files with different file->embed from the argument. (find_file_in_dir): Don't call pch_open_file if file->embed. (_cpp_find_file): Handle _cpp_FFK_EMBED and _cpp_FFK_HAS_EMBED. (read_file_guts): Formatting fix. (has_unique_contents): Ignore file->embed files. (search_path_head): Handle IT_EMBED type. (_cpp_stack_embed): New function. (_cpp_get_file_stat): Formatting fix. (cpp_set_include_chains): Add embed argument, save it to pfile->embed_include and compute lens for the chain. * init.cc (struct lang_flags): Add embed member. (lang_defaults): Add embed initializers. (cpp_set_lang): Initialize CPP_OPTION (pfile, embed). (builtin_array): Add __has_embed entry. (cpp_init_builtins): Predefine __STDC_EMBED_NOT_FOUND__, __STDC_EMBED_FOUND__ and __STDC_EMBED_EMPTY__. * lex.cc (cpp_directive_only_process): Handle #embed. * macro.cc (cpp_get_token_no_padding): Rename to ... (_cpp_get_token_no_padding): ... this. No longer static. (builtin_has_include_1): New function. (builtin_has_include): Use it. Use _cpp_get_token_no_padding instead of cpp_get_token_no_padding. (builtin_has_embed): New function. (_cpp_builtin_macro_text): Handle BT_HAS_EMBED. gcc/ * doc/cppdiropts.texi (--embed-dir=): Document. * doc/cpp.texi (Binary Resource Inclusion): New chapter. (__has_embed): Document. * doc/invoke.texi (Directory Options): Mention --embed-dir=. * gcc.cc (cpp_unique_options): Add %{-embed*}. * genmatch.cc (main): Adjust cpp_set_include_chains caller. * incpath.h (enum incpath_kind): Add INC_EMBED. * incpath.cc (merge_include_chains): Handle INC_EMBED. (register_include_chains): Adjust cpp_set_include_chains caller. gcc/c-family/ * c.opt (-embed-dir=): New option. (-embed-directory): New alias. (-embed-directory=): New alias. * c-opts.cc (c_common_handle_option): Handle OPT__embed_dir_. gcc/testsuite/ * c-c++-common/cpp/embed-1.c: New test. * c-c++-common/cpp/embed-2.c: New test. * c-c++-common/cpp/embed-3.c: New test. * c-c++-common/cpp/embed-4.c: New test. * c-c++-common/cpp/embed-5.c: New test. * c-c++-common/cpp/embed-6.c: New test. * c-c++-common/cpp/embed-7.c: New test. * c-c++-common/cpp/embed-8.c: New test. * c-c++-common/cpp/embed-9.c: New test. * c-c++-common/cpp/embed-10.c: New test. * c-c++-common/cpp/embed-11.c: New test. * c-c++-common/cpp/embed-12.c: New test. * c-c++-common/cpp/embed-13.c: New test. * c-c++-common/cpp/embed-14.c: New test. * c-c++-common/cpp/embed-25.c: New test. * c-c++-common/cpp/embed-26.c: New test. * c-c++-common/cpp/embed-dir/embed-1.inc: New test. * c-c++-common/cpp/embed-dir/embed-3.c: New test. * c-c++-common/cpp/embed-dir/embed-4.c: New test. * c-c++-common/cpp/embed-dir/magna-carta.txt: New test. * gcc.dg/cpp/embed-1.c: New test. * gcc.dg/cpp/embed-2.c: New test. * gcc.dg/cpp/embed-3.c: New test. * gcc.dg/cpp/embed-4.c: New test. * g++.dg/cpp/embed-1.C: New test. * g++.dg/cpp/embed-2.C: New test. * g++.dg/cpp/embed-3.C: New test.
960 lines
32 KiB
C++
960 lines
32 KiB
C++
/* CPP Library.
|
|
Copyright (C) 1986-2024 Free Software Foundation, Inc.
|
|
Contributed by Per Bothner, 1994-95.
|
|
Based on CCCP program by Paul Rubin, June 1986
|
|
Adapted to ANSI C, Richard Stallman, Jan 1987
|
|
|
|
This program is free software; you can redistribute it and/or modify it
|
|
under the terms of the GNU General Public License as published by the
|
|
Free Software Foundation; either version 3, or (at your option) any
|
|
later version.
|
|
|
|
This program is distributed in the hope that it will be useful,
|
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
GNU General Public License for more details.
|
|
|
|
You should have received a copy of the GNU General Public License
|
|
along with this program; see the file COPYING3. If not see
|
|
<http://www.gnu.org/licenses/>. */
|
|
|
|
#include "config.h"
|
|
#include "system.h"
|
|
#include "cpplib.h"
|
|
#include "internal.h"
|
|
#include "mkdeps.h"
|
|
#include "localedir.h"
|
|
#include "filenames.h"
|
|
|
|
#ifndef ENABLE_CANONICAL_SYSTEM_HEADERS
|
|
#ifdef HAVE_DOS_BASED_FILE_SYSTEM
|
|
#define ENABLE_CANONICAL_SYSTEM_HEADERS 1
|
|
#else
|
|
#define ENABLE_CANONICAL_SYSTEM_HEADERS 0
|
|
#endif
|
|
#endif
|
|
|
|
static void init_library (void);
|
|
static void mark_named_operators (cpp_reader *, int);
|
|
static bool read_original_filename (cpp_reader *);
|
|
static void read_original_directory (cpp_reader *);
|
|
static void post_options (cpp_reader *);
|
|
|
|
/* If we have designated initializers (GCC >2.7) these tables can be
|
|
initialized, constant data. Otherwise, they have to be filled in at
|
|
runtime. */
|
|
#if HAVE_DESIGNATED_INITIALIZERS
|
|
|
|
#define init_trigraph_map() /* Nothing. */
|
|
#define TRIGRAPH_MAP \
|
|
__extension__ const uchar _cpp_trigraph_map[UCHAR_MAX + 1] = {
|
|
|
|
#define END };
|
|
#define s(p, v) [p] = v,
|
|
|
|
#else
|
|
|
|
#define TRIGRAPH_MAP uchar _cpp_trigraph_map[UCHAR_MAX + 1] = { 0 }; \
|
|
static void init_trigraph_map (void) { \
|
|
unsigned char *x = _cpp_trigraph_map;
|
|
|
|
#define END }
|
|
#define s(p, v) x[p] = v;
|
|
|
|
#endif
|
|
|
|
TRIGRAPH_MAP
|
|
s('=', '#') s(')', ']') s('!', '|')
|
|
s('(', '[') s('\'', '^') s('>', '}')
|
|
s('/', '\\') s('<', '{') s('-', '~')
|
|
END
|
|
|
|
#undef s
|
|
#undef END
|
|
#undef TRIGRAPH_MAP
|
|
|
|
/* A set of booleans indicating what CPP features each source language
|
|
requires. */
|
|
struct lang_flags
|
|
{
|
|
unsigned int c99 : 1;
|
|
unsigned int cplusplus : 1;
|
|
unsigned int extended_numbers : 1;
|
|
unsigned int extended_identifiers : 1;
|
|
unsigned int c11_identifiers : 1;
|
|
unsigned int xid_identifiers : 1;
|
|
unsigned int std : 1;
|
|
unsigned int digraphs : 1;
|
|
unsigned int uliterals : 1;
|
|
unsigned int rliterals : 1;
|
|
unsigned int user_literals : 1;
|
|
unsigned int binary_constants : 1;
|
|
unsigned int digit_separators : 1;
|
|
unsigned int trigraphs : 1;
|
|
unsigned int utf8_char_literals : 1;
|
|
unsigned int va_opt : 1;
|
|
unsigned int scope : 1;
|
|
unsigned int dfp_constants : 1;
|
|
unsigned int size_t_literals : 1;
|
|
unsigned int elifdef : 1;
|
|
unsigned int warning_directive : 1;
|
|
unsigned int delimited_escape_seqs : 1;
|
|
unsigned int true_false : 1;
|
|
unsigned int embed : 1;
|
|
};
|
|
|
|
static const struct lang_flags lang_defaults[] = {
|
|
/* u e w
|
|
b d 8 l a t
|
|
x u i i c v s s i r d r e
|
|
x i d u r d n g t h a c z f n e u m
|
|
c c n x c d s i l l l c s r l o o d l d d l f b
|
|
9 + u i 1 i t g i i i s e i i p p f i e i i a e
|
|
9 + m d 1 d d r t t t t p g t t e p t f r m l d */
|
|
/* GNUC89 */ { 0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0 },
|
|
/* GNUC99 */ { 1,0,1,1,0,0,0,1,1,1,0,0,0,0,0,1,1,0,0,0,0,0,0,0 },
|
|
/* GNUC11 */ { 1,0,1,1,1,0,0,1,1,1,0,0,0,0,0,1,1,0,0,0,0,0,0,0 },
|
|
/* GNUC17 */ { 1,0,1,1,1,0,0,1,1,1,0,0,0,0,0,1,1,0,0,0,0,0,0,0 },
|
|
/* GNUC23 */ { 1,0,1,1,1,1,0,1,1,1,0,1,1,0,1,1,1,1,0,1,1,0,1,1 },
|
|
/* GNUC2Y */ { 1,0,1,1,1,1,0,1,1,1,0,1,1,0,1,1,1,1,0,1,1,0,1,1 },
|
|
/* STDC89 */ { 0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0 },
|
|
/* STDC94 */ { 0,0,0,0,0,0,1,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0 },
|
|
/* STDC99 */ { 1,0,1,1,0,0,1,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0 },
|
|
/* STDC11 */ { 1,0,1,1,1,0,1,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0 },
|
|
/* STDC17 */ { 1,0,1,1,1,0,1,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0 },
|
|
/* STDC23 */ { 1,0,1,1,1,1,1,1,1,0,0,1,1,0,1,1,1,1,0,1,1,0,1,1 },
|
|
/* STDC2Y */ { 1,0,1,1,1,1,1,1,1,0,0,1,1,0,1,1,1,1,0,1,1,0,1,1 },
|
|
/* GNUCXX */ { 0,1,1,1,0,1,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0,1,0 },
|
|
/* CXX98 */ { 0,1,0,1,0,1,1,1,0,0,0,0,0,1,0,0,1,0,0,0,0,0,1,0 },
|
|
/* GNUCXX11 */ { 1,1,1,1,1,1,0,1,1,1,1,0,0,0,0,1,1,0,0,0,0,0,1,0 },
|
|
/* CXX11 */ { 1,1,0,1,1,1,1,1,1,1,1,0,0,1,0,0,1,0,0,0,0,0,1,0 },
|
|
/* GNUCXX14 */ { 1,1,1,1,1,1,0,1,1,1,1,1,1,0,0,1,1,0,0,0,0,0,1,0 },
|
|
/* CXX14 */ { 1,1,0,1,1,1,1,1,1,1,1,1,1,1,0,0,1,0,0,0,0,0,1,0 },
|
|
/* GNUCXX17 */ { 1,1,1,1,1,1,0,1,1,1,1,1,1,0,1,1,1,0,0,0,0,0,1,0 },
|
|
/* CXX17 */ { 1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,0,1,0,0,0,0,0,1,0 },
|
|
/* GNUCXX20 */ { 1,1,1,1,1,1,0,1,1,1,1,1,1,0,1,1,1,0,0,0,0,0,1,0 },
|
|
/* CXX20 */ { 1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,0,0,0,0,0,1,0 },
|
|
/* GNUCXX23 */ { 1,1,1,1,1,1,0,1,1,1,1,1,1,0,1,1,1,0,1,1,1,1,1,0 },
|
|
/* CXX23 */ { 1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,0,1,1,1,1,1,0 },
|
|
/* GNUCXX26 */ { 1,1,1,1,1,1,0,1,1,1,1,1,1,0,1,1,1,0,1,1,1,1,1,0 },
|
|
/* CXX26 */ { 1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,0,1,1,1,1,1,0 },
|
|
/* ASM */ { 0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 }
|
|
};
|
|
|
|
/* Sets internal flags correctly for a given language. */
|
|
void
|
|
cpp_set_lang (cpp_reader *pfile, enum c_lang lang)
|
|
{
|
|
const struct lang_flags *l = &lang_defaults[(int) lang];
|
|
|
|
CPP_OPTION (pfile, lang) = lang;
|
|
|
|
CPP_OPTION (pfile, c99) = l->c99;
|
|
CPP_OPTION (pfile, cplusplus) = l->cplusplus;
|
|
CPP_OPTION (pfile, extended_numbers) = l->extended_numbers;
|
|
CPP_OPTION (pfile, extended_identifiers) = l->extended_identifiers;
|
|
CPP_OPTION (pfile, c11_identifiers) = l->c11_identifiers;
|
|
CPP_OPTION (pfile, xid_identifiers) = l->xid_identifiers;
|
|
CPP_OPTION (pfile, std) = l->std;
|
|
CPP_OPTION (pfile, digraphs) = l->digraphs;
|
|
CPP_OPTION (pfile, uliterals) = l->uliterals;
|
|
CPP_OPTION (pfile, rliterals) = l->rliterals;
|
|
CPP_OPTION (pfile, user_literals) = l->user_literals;
|
|
CPP_OPTION (pfile, binary_constants) = l->binary_constants;
|
|
CPP_OPTION (pfile, digit_separators) = l->digit_separators;
|
|
CPP_OPTION (pfile, trigraphs) = l->trigraphs;
|
|
CPP_OPTION (pfile, utf8_char_literals) = l->utf8_char_literals;
|
|
CPP_OPTION (pfile, va_opt) = l->va_opt;
|
|
CPP_OPTION (pfile, scope) = l->scope;
|
|
CPP_OPTION (pfile, dfp_constants) = l->dfp_constants;
|
|
CPP_OPTION (pfile, size_t_literals) = l->size_t_literals;
|
|
CPP_OPTION (pfile, elifdef) = l->elifdef;
|
|
CPP_OPTION (pfile, warning_directive) = l->warning_directive;
|
|
CPP_OPTION (pfile, delimited_escape_seqs) = l->delimited_escape_seqs;
|
|
CPP_OPTION (pfile, true_false) = l->true_false;
|
|
CPP_OPTION (pfile, embed) = l->embed;
|
|
}
|
|
|
|
/* Initialize library global state. */
|
|
static void
|
|
init_library (void)
|
|
{
|
|
static int initialized = 0;
|
|
|
|
if (! initialized)
|
|
{
|
|
initialized = 1;
|
|
|
|
_cpp_init_lexer ();
|
|
|
|
/* Set up the trigraph map. This doesn't need to do anything if
|
|
we were compiled with a compiler that supports C99 designated
|
|
initializers. */
|
|
init_trigraph_map ();
|
|
|
|
#ifdef ENABLE_NLS
|
|
(void) bindtextdomain (PACKAGE, LOCALEDIR);
|
|
#endif
|
|
}
|
|
}
|
|
|
|
/* Initialize a cpp_reader structure. */
|
|
cpp_reader *
|
|
cpp_create_reader (enum c_lang lang, cpp_hash_table *table,
|
|
class line_maps *line_table, cpp_hash_table *extra_table)
|
|
{
|
|
cpp_reader *pfile;
|
|
|
|
/* Initialize this instance of the library if it hasn't been already. */
|
|
init_library ();
|
|
|
|
pfile = XCNEW (cpp_reader);
|
|
memset (&pfile->base_context, 0, sizeof (pfile->base_context));
|
|
|
|
cpp_set_lang (pfile, lang);
|
|
CPP_OPTION (pfile, warn_multichar) = 1;
|
|
CPP_OPTION (pfile, discard_comments) = 1;
|
|
CPP_OPTION (pfile, discard_comments_in_macro_exp) = 1;
|
|
CPP_OPTION (pfile, max_include_depth) = 200;
|
|
CPP_OPTION (pfile, operator_names) = 1;
|
|
CPP_OPTION (pfile, warn_trigraphs) = 2;
|
|
CPP_OPTION (pfile, warn_endif_labels) = 1;
|
|
CPP_OPTION (pfile, cpp_warn_c90_c99_compat) = -1;
|
|
CPP_OPTION (pfile, cpp_warn_c11_c23_compat) = -1;
|
|
CPP_OPTION (pfile, cpp_warn_cxx11_compat) = 0;
|
|
CPP_OPTION (pfile, cpp_warn_cxx20_compat) = 0;
|
|
CPP_OPTION (pfile, cpp_warn_deprecated) = 1;
|
|
CPP_OPTION (pfile, cpp_warn_long_long) = 0;
|
|
CPP_OPTION (pfile, dollars_in_ident) = 1;
|
|
CPP_OPTION (pfile, warn_dollars) = 1;
|
|
CPP_OPTION (pfile, warn_variadic_macros) = 1;
|
|
CPP_OPTION (pfile, warn_builtin_macro_redefined) = 1;
|
|
CPP_OPTION (pfile, cpp_warn_implicit_fallthrough) = 0;
|
|
/* By default, track locations of tokens resulting from macro
|
|
expansion. The '2' means, track the locations with the highest
|
|
accuracy. Read the comments for struct
|
|
cpp_options::track_macro_expansion to learn about the other
|
|
values. */
|
|
CPP_OPTION (pfile, track_macro_expansion) = 2;
|
|
CPP_OPTION (pfile, warn_normalize) = normalized_C;
|
|
CPP_OPTION (pfile, warn_literal_suffix) = 1;
|
|
CPP_OPTION (pfile, canonical_system_headers)
|
|
= ENABLE_CANONICAL_SYSTEM_HEADERS;
|
|
CPP_OPTION (pfile, ext_numeric_literals) = 1;
|
|
CPP_OPTION (pfile, warn_date_time) = 0;
|
|
CPP_OPTION (pfile, cpp_warn_bidirectional) = bidirectional_unpaired;
|
|
CPP_OPTION (pfile, cpp_warn_invalid_utf8) = 0;
|
|
CPP_OPTION (pfile, cpp_warn_unicode) = 1;
|
|
CPP_OPTION (pfile, cpp_input_charset_explicit) = 0;
|
|
|
|
/* Default CPP arithmetic to something sensible for the host for the
|
|
benefit of dumb users like fix-header. */
|
|
CPP_OPTION (pfile, precision) = CHAR_BIT * sizeof (long);
|
|
CPP_OPTION (pfile, char_precision) = CHAR_BIT;
|
|
CPP_OPTION (pfile, wchar_precision) = CHAR_BIT * sizeof (int);
|
|
CPP_OPTION (pfile, int_precision) = CHAR_BIT * sizeof (int);
|
|
CPP_OPTION (pfile, unsigned_char) = 0;
|
|
CPP_OPTION (pfile, unsigned_wchar) = 1;
|
|
CPP_OPTION (pfile, unsigned_utf8char) = 1;
|
|
CPP_OPTION (pfile, bytes_big_endian) = 1; /* does not matter */
|
|
|
|
/* Default to no charset conversion. */
|
|
CPP_OPTION (pfile, narrow_charset) = _cpp_default_encoding ();
|
|
CPP_OPTION (pfile, wide_charset) = 0;
|
|
|
|
/* Default the input character set to UTF-8. */
|
|
CPP_OPTION (pfile, input_charset) = _cpp_default_encoding ();
|
|
|
|
/* A fake empty "directory" used as the starting point for files
|
|
looked up without a search path. Name cannot be '/' because we
|
|
don't want to prepend anything at all to filenames using it. All
|
|
other entries are correct zero-initialized. */
|
|
pfile->no_search_path.name = (char *) "";
|
|
|
|
/* Initialize the line map. */
|
|
pfile->line_table = line_table;
|
|
|
|
/* Initialize lexer state. */
|
|
pfile->state.save_comments = ! CPP_OPTION (pfile, discard_comments);
|
|
|
|
/* Set up static tokens. */
|
|
pfile->avoid_paste.type = CPP_PADDING;
|
|
pfile->avoid_paste.val.source = NULL;
|
|
pfile->avoid_paste.src_loc = 0;
|
|
pfile->endarg.type = CPP_EOF;
|
|
pfile->endarg.flags = 0;
|
|
pfile->endarg.src_loc = 0;
|
|
|
|
/* Create a token buffer for the lexer. */
|
|
_cpp_init_tokenrun (&pfile->base_run, 250);
|
|
pfile->cur_run = &pfile->base_run;
|
|
pfile->cur_token = pfile->base_run.base;
|
|
|
|
/* Initialize the base context. */
|
|
pfile->context = &pfile->base_context;
|
|
pfile->base_context.c.macro = 0;
|
|
pfile->base_context.prev = pfile->base_context.next = 0;
|
|
|
|
/* Aligned and unaligned storage. */
|
|
pfile->a_buff = _cpp_get_buff (pfile, 0);
|
|
pfile->u_buff = _cpp_get_buff (pfile, 0);
|
|
|
|
/* Initialize table for push_macro/pop_macro. */
|
|
pfile->pushed_macros = 0;
|
|
|
|
/* Do not force token locations by default. */
|
|
pfile->forced_token_location = 0;
|
|
|
|
/* Note the timestamp is unset. */
|
|
pfile->time_stamp = time_t (-1);
|
|
pfile->time_stamp_kind = 0;
|
|
|
|
/* The expression parser stack. */
|
|
_cpp_expand_op_stack (pfile);
|
|
|
|
/* Initialize the buffer obstack. */
|
|
obstack_specify_allocation (&pfile->buffer_ob, 0, 0, xmalloc, free);
|
|
|
|
_cpp_init_files (pfile);
|
|
|
|
_cpp_init_hashtable (pfile, table, extra_table);
|
|
|
|
return pfile;
|
|
}
|
|
|
|
/* Set the line_table entry in PFILE. This is called after reading a
|
|
PCH file, as the old line_table will be incorrect. */
|
|
void
|
|
cpp_set_line_map (cpp_reader *pfile, class line_maps *line_table)
|
|
{
|
|
pfile->line_table = line_table;
|
|
}
|
|
|
|
/* Free resources used by PFILE. Accessing PFILE after this function
|
|
returns leads to undefined behavior. Returns the error count. */
|
|
void
|
|
cpp_destroy (cpp_reader *pfile)
|
|
{
|
|
cpp_context *context, *contextn;
|
|
struct def_pragma_macro *pmacro;
|
|
tokenrun *run, *runn;
|
|
int i;
|
|
|
|
free (pfile->op_stack);
|
|
|
|
while (CPP_BUFFER (pfile) != NULL)
|
|
_cpp_pop_buffer (pfile);
|
|
|
|
free (pfile->out.base);
|
|
|
|
if (pfile->macro_buffer)
|
|
{
|
|
free (pfile->macro_buffer);
|
|
pfile->macro_buffer = NULL;
|
|
pfile->macro_buffer_len = 0;
|
|
}
|
|
|
|
if (pfile->deps)
|
|
deps_free (pfile->deps);
|
|
obstack_free (&pfile->buffer_ob, 0);
|
|
|
|
_cpp_destroy_hashtable (pfile);
|
|
_cpp_cleanup_files (pfile);
|
|
_cpp_destroy_iconv (pfile);
|
|
|
|
_cpp_free_buff (pfile->a_buff);
|
|
_cpp_free_buff (pfile->u_buff);
|
|
_cpp_free_buff (pfile->free_buffs);
|
|
|
|
for (run = &pfile->base_run; run; run = runn)
|
|
{
|
|
runn = run->next;
|
|
free (run->base);
|
|
if (run != &pfile->base_run)
|
|
free (run);
|
|
}
|
|
|
|
for (context = pfile->base_context.next; context; context = contextn)
|
|
{
|
|
contextn = context->next;
|
|
free (context);
|
|
}
|
|
|
|
if (pfile->comments.entries)
|
|
{
|
|
for (i = 0; i < pfile->comments.count; i++)
|
|
free (pfile->comments.entries[i].comment);
|
|
|
|
free (pfile->comments.entries);
|
|
}
|
|
if (pfile->pushed_macros)
|
|
{
|
|
do
|
|
{
|
|
pmacro = pfile->pushed_macros;
|
|
pfile->pushed_macros = pmacro->next;
|
|
free (pmacro->name);
|
|
free (pmacro);
|
|
}
|
|
while (pfile->pushed_macros);
|
|
}
|
|
|
|
free (pfile);
|
|
}
|
|
|
|
/* This structure defines one built-in identifier. A node will be
|
|
entered in the hash table under the name NAME, with value VALUE.
|
|
|
|
There are two tables of these. builtin_array holds all the
|
|
"builtin" macros: these are handled by builtin_macro() in
|
|
macro.cc. Builtin is somewhat of a misnomer -- the property of
|
|
interest is that these macros require special code to compute their
|
|
expansions. The value is a "cpp_builtin_type" enumerator.
|
|
|
|
operator_array holds the C++ named operators. These are keywords
|
|
which act as aliases for punctuators. In C++, they cannot be
|
|
altered through #define, and #if recognizes them as operators. In
|
|
C, these are not entered into the hash table at all (but see
|
|
<iso646.h>). The value is a token-type enumerator. */
|
|
struct builtin_macro
|
|
{
|
|
const uchar *const name;
|
|
const unsigned short len;
|
|
const unsigned short value;
|
|
const bool always_warn_if_redefined;
|
|
};
|
|
|
|
#define B(n, t, f) { DSC(n), t, f }
|
|
static const struct builtin_macro builtin_array[] =
|
|
{
|
|
B("__TIMESTAMP__", BT_TIMESTAMP, false),
|
|
B("__TIME__", BT_TIME, false),
|
|
B("__DATE__", BT_DATE, false),
|
|
B("__FILE__", BT_FILE, false),
|
|
B("__FILE_NAME__", BT_FILE_NAME, false),
|
|
B("__BASE_FILE__", BT_BASE_FILE, false),
|
|
B("__LINE__", BT_SPECLINE, true),
|
|
B("__INCLUDE_LEVEL__", BT_INCLUDE_LEVEL, true),
|
|
B("__COUNTER__", BT_COUNTER, true),
|
|
/* Make sure to update the list of built-in
|
|
function-like macros in traditional.cc:
|
|
fun_like_macro() when adding more following */
|
|
B("__has_attribute", BT_HAS_ATTRIBUTE, true),
|
|
B("__has_c_attribute", BT_HAS_STD_ATTRIBUTE, true),
|
|
B("__has_cpp_attribute", BT_HAS_ATTRIBUTE, true),
|
|
B("__has_builtin", BT_HAS_BUILTIN, true),
|
|
B("__has_include", BT_HAS_INCLUDE, true),
|
|
B("__has_include_next",BT_HAS_INCLUDE_NEXT, true),
|
|
B("__has_embed", BT_HAS_EMBED, true),
|
|
B("__has_feature", BT_HAS_FEATURE, true),
|
|
B("__has_extension", BT_HAS_EXTENSION, true),
|
|
/* Keep builtins not used for -traditional-cpp at the end, and
|
|
update init_builtins() if any more are added. */
|
|
B("_Pragma", BT_PRAGMA, true),
|
|
B("__STDC__", BT_STDC, true),
|
|
};
|
|
#undef B
|
|
|
|
struct builtin_operator
|
|
{
|
|
const uchar *const name;
|
|
const unsigned short len;
|
|
const unsigned short value;
|
|
};
|
|
|
|
#define B(n, t) { DSC(n), t }
|
|
static const struct builtin_operator operator_array[] =
|
|
{
|
|
B("and", CPP_AND_AND),
|
|
B("and_eq", CPP_AND_EQ),
|
|
B("bitand", CPP_AND),
|
|
B("bitor", CPP_OR),
|
|
B("compl", CPP_COMPL),
|
|
B("not", CPP_NOT),
|
|
B("not_eq", CPP_NOT_EQ),
|
|
B("or", CPP_OR_OR),
|
|
B("or_eq", CPP_OR_EQ),
|
|
B("xor", CPP_XOR),
|
|
B("xor_eq", CPP_XOR_EQ)
|
|
};
|
|
#undef B
|
|
|
|
/* Mark the C++ named operators in the hash table. */
|
|
static void
|
|
mark_named_operators (cpp_reader *pfile, int flags)
|
|
{
|
|
const struct builtin_operator *b;
|
|
|
|
for (b = operator_array;
|
|
b < (operator_array + ARRAY_SIZE (operator_array));
|
|
b++)
|
|
{
|
|
cpp_hashnode *hp = cpp_lookup (pfile, b->name, b->len);
|
|
hp->flags |= flags;
|
|
hp->is_directive = 0;
|
|
hp->directive_index = b->value;
|
|
}
|
|
}
|
|
|
|
/* Helper function of cpp_type2name. Return the string associated with
|
|
named operator TYPE. */
|
|
const char *
|
|
cpp_named_operator2name (enum cpp_ttype type)
|
|
{
|
|
const struct builtin_operator *b;
|
|
|
|
for (b = operator_array;
|
|
b < (operator_array + ARRAY_SIZE (operator_array));
|
|
b++)
|
|
{
|
|
if (type == b->value)
|
|
return (const char *) b->name;
|
|
}
|
|
|
|
return NULL;
|
|
}
|
|
|
|
void
|
|
cpp_init_special_builtins (cpp_reader *pfile)
|
|
{
|
|
const struct builtin_macro *b;
|
|
size_t n = ARRAY_SIZE (builtin_array);
|
|
|
|
if (CPP_OPTION (pfile, traditional))
|
|
n -= 2;
|
|
else if (! CPP_OPTION (pfile, stdc_0_in_system_headers)
|
|
|| CPP_OPTION (pfile, std))
|
|
n--;
|
|
|
|
for (b = builtin_array; b < builtin_array + n; b++)
|
|
{
|
|
if ((b->value == BT_HAS_ATTRIBUTE
|
|
|| b->value == BT_HAS_STD_ATTRIBUTE
|
|
|| b->value == BT_HAS_BUILTIN)
|
|
&& (CPP_OPTION (pfile, lang) == CLK_ASM
|
|
|| pfile->cb.has_attribute == NULL))
|
|
continue;
|
|
cpp_hashnode *hp = cpp_lookup (pfile, b->name, b->len);
|
|
hp->type = NT_BUILTIN_MACRO;
|
|
if (b->always_warn_if_redefined)
|
|
hp->flags |= NODE_WARN;
|
|
hp->value.builtin = (enum cpp_builtin_type) b->value;
|
|
}
|
|
}
|
|
|
|
/* Restore macro C to builtin macro definition. */
|
|
|
|
void
|
|
_cpp_restore_special_builtin (cpp_reader *pfile, struct def_pragma_macro *c)
|
|
{
|
|
size_t len = strlen (c->name);
|
|
|
|
for (const struct builtin_macro *b = builtin_array;
|
|
b < builtin_array + ARRAY_SIZE (builtin_array); b++)
|
|
if (b->len == len && memcmp (c->name, b->name, len + 1) == 0)
|
|
{
|
|
cpp_hashnode *hp = cpp_lookup (pfile, b->name, b->len);
|
|
hp->type = NT_BUILTIN_MACRO;
|
|
if (b->always_warn_if_redefined)
|
|
hp->flags |= NODE_WARN;
|
|
hp->value.builtin = (enum cpp_builtin_type) b->value;
|
|
}
|
|
}
|
|
|
|
/* Read the builtins table above and enter them, and language-specific
|
|
macros, into the hash table. HOSTED is true if this is a hosted
|
|
environment. */
|
|
void
|
|
cpp_init_builtins (cpp_reader *pfile, int hosted)
|
|
{
|
|
cpp_init_special_builtins (pfile);
|
|
|
|
if (!CPP_OPTION (pfile, traditional)
|
|
&& (! CPP_OPTION (pfile, stdc_0_in_system_headers)
|
|
|| CPP_OPTION (pfile, std)))
|
|
_cpp_define_builtin (pfile, "__STDC__ 1");
|
|
|
|
if (CPP_OPTION (pfile, cplusplus))
|
|
{
|
|
/* C++26 is not yet a standard. For now, use an invalid
|
|
year/month, 202400L, which is larger than 202302L. */
|
|
if (CPP_OPTION (pfile, lang) == CLK_CXX26
|
|
|| CPP_OPTION (pfile, lang) == CLK_GNUCXX26)
|
|
_cpp_define_builtin (pfile, "__cplusplus 202400L");
|
|
else if (CPP_OPTION (pfile, lang) == CLK_CXX23
|
|
|| CPP_OPTION (pfile, lang) == CLK_GNUCXX23)
|
|
_cpp_define_builtin (pfile, "__cplusplus 202302L");
|
|
else if (CPP_OPTION (pfile, lang) == CLK_CXX20
|
|
|| CPP_OPTION (pfile, lang) == CLK_GNUCXX20)
|
|
_cpp_define_builtin (pfile, "__cplusplus 202002L");
|
|
else if (CPP_OPTION (pfile, lang) == CLK_CXX17
|
|
|| CPP_OPTION (pfile, lang) == CLK_GNUCXX17)
|
|
_cpp_define_builtin (pfile, "__cplusplus 201703L");
|
|
else if (CPP_OPTION (pfile, lang) == CLK_CXX14
|
|
|| CPP_OPTION (pfile, lang) == CLK_GNUCXX14)
|
|
_cpp_define_builtin (pfile, "__cplusplus 201402L");
|
|
else if (CPP_OPTION (pfile, lang) == CLK_CXX11
|
|
|| CPP_OPTION (pfile, lang) == CLK_GNUCXX11)
|
|
_cpp_define_builtin (pfile, "__cplusplus 201103L");
|
|
else
|
|
_cpp_define_builtin (pfile, "__cplusplus 199711L");
|
|
}
|
|
else if (CPP_OPTION (pfile, lang) == CLK_ASM)
|
|
_cpp_define_builtin (pfile, "__ASSEMBLER__ 1");
|
|
else if (CPP_OPTION (pfile, lang) == CLK_STDC94)
|
|
_cpp_define_builtin (pfile, "__STDC_VERSION__ 199409L");
|
|
else if (CPP_OPTION (pfile, lang) == CLK_STDC23
|
|
|| CPP_OPTION (pfile, lang) == CLK_GNUC23)
|
|
_cpp_define_builtin (pfile, "__STDC_VERSION__ 202311L");
|
|
else if (CPP_OPTION (pfile, lang) == CLK_STDC2Y
|
|
|| CPP_OPTION (pfile, lang) == CLK_GNUC2Y)
|
|
_cpp_define_builtin (pfile, "__STDC_VERSION__ 202500L");
|
|
else if (CPP_OPTION (pfile, lang) == CLK_STDC17
|
|
|| CPP_OPTION (pfile, lang) == CLK_GNUC17)
|
|
_cpp_define_builtin (pfile, "__STDC_VERSION__ 201710L");
|
|
else if (CPP_OPTION (pfile, lang) == CLK_STDC11
|
|
|| CPP_OPTION (pfile, lang) == CLK_GNUC11)
|
|
_cpp_define_builtin (pfile, "__STDC_VERSION__ 201112L");
|
|
else if (CPP_OPTION (pfile, c99))
|
|
_cpp_define_builtin (pfile, "__STDC_VERSION__ 199901L");
|
|
|
|
if (CPP_OPTION (pfile, uliterals)
|
|
&& !(CPP_OPTION (pfile, cplusplus)
|
|
&& (CPP_OPTION (pfile, lang) == CLK_GNUCXX
|
|
|| CPP_OPTION (pfile, lang) == CLK_CXX98)))
|
|
{
|
|
_cpp_define_builtin (pfile, "__STDC_UTF_16__ 1");
|
|
_cpp_define_builtin (pfile, "__STDC_UTF_32__ 1");
|
|
}
|
|
|
|
if (hosted)
|
|
_cpp_define_builtin (pfile, "__STDC_HOSTED__ 1");
|
|
else
|
|
_cpp_define_builtin (pfile, "__STDC_HOSTED__ 0");
|
|
|
|
_cpp_define_builtin (pfile, "__STDC_EMBED_NOT_FOUND__ 0");
|
|
_cpp_define_builtin (pfile, "__STDC_EMBED_FOUND__ 1");
|
|
_cpp_define_builtin (pfile, "__STDC_EMBED_EMPTY__ 2");
|
|
|
|
if (CPP_OPTION (pfile, objc))
|
|
_cpp_define_builtin (pfile, "__OBJC__ 1");
|
|
}
|
|
|
|
/* Sanity-checks are dependent on command-line options, so it is
|
|
called as a subroutine of cpp_read_main_file. */
|
|
#if CHECKING_P
|
|
static void sanity_checks (cpp_reader *);
|
|
static void sanity_checks (cpp_reader *pfile)
|
|
{
|
|
cppchar_t test = 0;
|
|
size_t max_precision = 2 * CHAR_BIT * sizeof (cpp_num_part);
|
|
|
|
/* Sanity checks for assumptions about CPP arithmetic and target
|
|
type precisions made by cpplib. */
|
|
test--;
|
|
if (test < 1)
|
|
cpp_error (pfile, CPP_DL_ICE, "cppchar_t must be an unsigned type");
|
|
|
|
if (CPP_OPTION (pfile, precision) > max_precision)
|
|
cpp_error (pfile, CPP_DL_ICE,
|
|
"preprocessor arithmetic has maximum precision of %lu bits;"
|
|
" target requires %lu bits",
|
|
(unsigned long) max_precision,
|
|
(unsigned long) CPP_OPTION (pfile, precision));
|
|
|
|
if (CPP_OPTION (pfile, precision) < CPP_OPTION (pfile, int_precision))
|
|
cpp_error (pfile, CPP_DL_ICE,
|
|
"CPP arithmetic must be at least as precise as a target int");
|
|
|
|
if (CPP_OPTION (pfile, char_precision) < 8)
|
|
cpp_error (pfile, CPP_DL_ICE, "target char is less than 8 bits wide");
|
|
|
|
if (CPP_OPTION (pfile, wchar_precision) < CPP_OPTION (pfile, char_precision))
|
|
cpp_error (pfile, CPP_DL_ICE,
|
|
"target wchar_t is narrower than target char");
|
|
|
|
if (CPP_OPTION (pfile, int_precision) < CPP_OPTION (pfile, char_precision))
|
|
cpp_error (pfile, CPP_DL_ICE,
|
|
"target int is narrower than target char");
|
|
|
|
/* This is assumed in eval_token() and could be fixed if necessary. */
|
|
if (sizeof (cppchar_t) > sizeof (cpp_num_part))
|
|
cpp_error (pfile, CPP_DL_ICE,
|
|
"CPP half-integer narrower than CPP character");
|
|
|
|
if (CPP_OPTION (pfile, wchar_precision) > BITS_PER_CPPCHAR_T)
|
|
cpp_error (pfile, CPP_DL_ICE,
|
|
"CPP on this host cannot handle wide character constants over"
|
|
" %lu bits, but the target requires %lu bits",
|
|
(unsigned long) BITS_PER_CPPCHAR_T,
|
|
(unsigned long) CPP_OPTION (pfile, wchar_precision));
|
|
}
|
|
#else
|
|
# define sanity_checks(PFILE)
|
|
#endif
|
|
|
|
/* This is called after options have been parsed, and partially
|
|
processed. */
|
|
void
|
|
cpp_post_options (cpp_reader *pfile)
|
|
{
|
|
int flags;
|
|
|
|
sanity_checks (pfile);
|
|
|
|
post_options (pfile);
|
|
|
|
/* Mark named operators before handling command line macros. */
|
|
flags = 0;
|
|
if (CPP_OPTION (pfile, cplusplus) && CPP_OPTION (pfile, operator_names))
|
|
flags |= NODE_OPERATOR;
|
|
if (CPP_OPTION (pfile, warn_cxx_operator_names))
|
|
flags |= NODE_DIAGNOSTIC | NODE_WARN_OPERATOR;
|
|
if (flags != 0)
|
|
mark_named_operators (pfile, flags);
|
|
}
|
|
|
|
/* Setup for processing input from the file named FNAME, or stdin if
|
|
it is the empty string. Return the original filename on success
|
|
(e.g. foo.i->foo.c), or NULL on failure. INJECTING is true if
|
|
there may be injected headers before line 1 of the main file. */
|
|
const char *
|
|
cpp_read_main_file (cpp_reader *pfile, const char *fname, bool injecting)
|
|
{
|
|
if (mkdeps *deps = cpp_get_deps (pfile))
|
|
/* Set the default target (if there is none already). */
|
|
deps_add_default_target (deps, fname);
|
|
|
|
pfile->main_file
|
|
= _cpp_find_file (pfile, fname,
|
|
CPP_OPTION (pfile, preprocessed) ? &pfile->no_search_path
|
|
: CPP_OPTION (pfile, main_search) == CMS_user
|
|
? pfile->quote_include
|
|
: CPP_OPTION (pfile, main_search) == CMS_system
|
|
? pfile->bracket_include : &pfile->no_search_path,
|
|
/*angle=*/0, _cpp_FFK_NORMAL, 0);
|
|
|
|
if (_cpp_find_failed (pfile->main_file))
|
|
return NULL;
|
|
|
|
_cpp_stack_file (pfile, pfile->main_file,
|
|
injecting || CPP_OPTION (pfile, preprocessed)
|
|
? IT_PRE_MAIN : IT_MAIN, 0);
|
|
|
|
/* For foo.i, read the original filename foo.c now, for the benefit
|
|
of the front ends. */
|
|
if (CPP_OPTION (pfile, preprocessed))
|
|
if (!read_original_filename (pfile))
|
|
{
|
|
/* We're on line 1 after all. */
|
|
auto *last = linemap_check_ordinary
|
|
(LINEMAPS_LAST_MAP (pfile->line_table, false));
|
|
last->to_line = 1;
|
|
/* Inform of as-if a file change. */
|
|
_cpp_do_file_change (pfile, LC_RENAME_VERBATIM, LINEMAP_FILE (last),
|
|
LINEMAP_LINE (last), LINEMAP_SYSP (last));
|
|
}
|
|
|
|
auto *map = LINEMAPS_LAST_ORDINARY_MAP (pfile->line_table);
|
|
pfile->main_loc = MAP_START_LOCATION (map);
|
|
|
|
return ORDINARY_MAP_FILE_NAME (map);
|
|
}
|
|
|
|
location_t
|
|
cpp_main_loc (const cpp_reader *pfile)
|
|
{
|
|
return pfile->main_loc;
|
|
}
|
|
|
|
/* For preprocessed files, if the very first characters are
|
|
'#<SPACE>[01]<SPACE>', then handle a line directive so we know the
|
|
original file name. This will generate file_change callbacks,
|
|
which the front ends must handle appropriately given their state of
|
|
initialization. We peek directly into the character buffer, so
|
|
that we're not confused by otherwise-skipped white space &
|
|
comments. We can be very picky, because this should have been
|
|
machine-generated text (by us, no less). This way we do not
|
|
interfere with the module directive state machine. */
|
|
|
|
static bool
|
|
read_original_filename (cpp_reader *pfile)
|
|
{
|
|
auto *buf = pfile->buffer->next_line;
|
|
|
|
if (pfile->buffer->rlimit - buf > 4
|
|
&& buf[0] == '#'
|
|
&& buf[1] == ' '
|
|
// Also permit '1', as that's what used to be here
|
|
&& (buf[2] == '0' || buf[2] == '1')
|
|
&& buf[3] == ' ')
|
|
{
|
|
const cpp_token *token = _cpp_lex_direct (pfile);
|
|
gcc_checking_assert (token->type == CPP_HASH);
|
|
if (_cpp_handle_directive (pfile, token->flags & PREV_WHITE))
|
|
{
|
|
read_original_directory (pfile);
|
|
|
|
auto *penult = &linemap_check_ordinary
|
|
(LINEMAPS_LAST_MAP (pfile->line_table, false))[-1];
|
|
if (penult[1].reason == LC_RENAME_VERBATIM)
|
|
{
|
|
/* Expunge any evidence of the original linemap. */
|
|
pfile->line_table->highest_location
|
|
= pfile->line_table->highest_line
|
|
= penult[0].start_location;
|
|
|
|
penult[1].start_location = penult[0].start_location;
|
|
penult[1].reason = penult[0].reason;
|
|
penult[0] = penult[1];
|
|
pfile->line_table->info_ordinary.used--;
|
|
pfile->line_table->info_ordinary.m_cache = 0;
|
|
}
|
|
|
|
return true;
|
|
}
|
|
}
|
|
|
|
return false;
|
|
}
|
|
|
|
/* For preprocessed files, if the tokens following the first filename
|
|
line is of the form # <line> "/path/name//", handle the
|
|
directive so we know the original current directory.
|
|
|
|
As with the first line peeking, we can do this without lexing by
|
|
being picky. */
|
|
static void
|
|
read_original_directory (cpp_reader *pfile)
|
|
{
|
|
auto *buf = pfile->buffer->next_line;
|
|
|
|
if (pfile->buffer->rlimit - buf > 4
|
|
&& buf[0] == '#'
|
|
&& buf[1] == ' '
|
|
// Also permit '1', as that's what used to be here
|
|
&& (buf[2] == '0' || buf[2] == '1')
|
|
&& buf[3] == ' ')
|
|
{
|
|
const cpp_token *hash = _cpp_lex_direct (pfile);
|
|
gcc_checking_assert (hash->type == CPP_HASH);
|
|
pfile->state.in_directive = 1;
|
|
const cpp_token *number = _cpp_lex_direct (pfile);
|
|
gcc_checking_assert (number->type == CPP_NUMBER);
|
|
const cpp_token *string = _cpp_lex_direct (pfile);
|
|
pfile->state.in_directive = 0;
|
|
|
|
const unsigned char *text = nullptr;
|
|
size_t len = 0;
|
|
if (string->type == CPP_STRING)
|
|
{
|
|
/* The string value includes the quotes. */
|
|
text = string->val.str.text;
|
|
len = string->val.str.len;
|
|
}
|
|
if (len < 5
|
|
|| !IS_DIR_SEPARATOR (text[len - 2])
|
|
|| !IS_DIR_SEPARATOR (text[len - 3]))
|
|
{
|
|
/* That didn't work out, back out. */
|
|
_cpp_backup_tokens (pfile, 3);
|
|
return;
|
|
}
|
|
|
|
if (pfile->cb.dir_change)
|
|
{
|
|
/* Smash the string directly, it's dead at this point */
|
|
char *smashy = (char *)text;
|
|
smashy[len - 3] = 0;
|
|
|
|
pfile->cb.dir_change (pfile, smashy + 1);
|
|
}
|
|
|
|
/* We should be at EOL. */
|
|
}
|
|
}
|
|
|
|
/* This is called at the end of preprocessing. It pops the last
|
|
buffer and writes dependency output.
|
|
|
|
Maybe it should also reset state, such that you could call
|
|
cpp_start_read with a new filename to restart processing. */
|
|
void
|
|
cpp_finish (struct cpp_reader *pfile, FILE *deps_stream, FILE *fdeps_stream)
|
|
{
|
|
/* Warn about unused macros before popping the final buffer. */
|
|
if (CPP_OPTION (pfile, warn_unused_macros))
|
|
cpp_forall_identifiers (pfile, _cpp_warn_if_unused_macro, NULL);
|
|
|
|
/* lex.cc leaves the final buffer on the stack. This it so that
|
|
it returns an unending stream of CPP_EOFs to the client. If we
|
|
popped the buffer, we'd dereference a NULL buffer pointer and
|
|
segfault. It's nice to allow the client to do worry-free excess
|
|
cpp_get_token calls. */
|
|
while (pfile->buffer)
|
|
_cpp_pop_buffer (pfile);
|
|
|
|
cpp_fdeps_format fdeps_format = CPP_OPTION (pfile, deps.fdeps_format);
|
|
if (fdeps_format == FDEPS_FMT_P1689R5 && fdeps_stream)
|
|
deps_write_p1689r5 (pfile->deps, fdeps_stream);
|
|
|
|
if (CPP_OPTION (pfile, deps.style) != DEPS_NONE
|
|
&& deps_stream)
|
|
{
|
|
deps_write (pfile, deps_stream, 72);
|
|
}
|
|
|
|
/* Report on headers that could use multiple include guards. */
|
|
if (CPP_OPTION (pfile, print_include_names))
|
|
_cpp_report_missing_guards (pfile);
|
|
}
|
|
|
|
static void
|
|
post_options (cpp_reader *pfile)
|
|
{
|
|
/* -Wtraditional is not useful in C++ mode. */
|
|
if (CPP_OPTION (pfile, cplusplus))
|
|
CPP_OPTION (pfile, cpp_warn_traditional) = 0;
|
|
|
|
/* Permanently disable macro expansion if we are rescanning
|
|
preprocessed text. Read preprocesed source in ISO mode. */
|
|
if (CPP_OPTION (pfile, preprocessed))
|
|
{
|
|
if (!CPP_OPTION (pfile, directives_only))
|
|
pfile->state.prevent_expansion = 1;
|
|
CPP_OPTION (pfile, traditional) = 0;
|
|
}
|
|
|
|
if (CPP_OPTION (pfile, warn_trigraphs) == 2)
|
|
CPP_OPTION (pfile, warn_trigraphs) = !CPP_OPTION (pfile, trigraphs);
|
|
|
|
if (CPP_OPTION (pfile, traditional))
|
|
{
|
|
CPP_OPTION (pfile, trigraphs) = 0;
|
|
CPP_OPTION (pfile, warn_trigraphs) = 0;
|
|
}
|
|
|
|
if (CPP_OPTION (pfile, module_directives))
|
|
{
|
|
/* These unspellable tokens have a leading space. */
|
|
const char *const inits[spec_nodes::M_HWM]
|
|
= {"export ", "module ", "import ", "__import"};
|
|
|
|
for (int ix = 0; ix != spec_nodes::M_HWM; ix++)
|
|
{
|
|
cpp_hashnode *node = cpp_lookup (pfile, UC (inits[ix]),
|
|
strlen (inits[ix]));
|
|
|
|
/* Token we pass to the compiler. */
|
|
pfile->spec_nodes.n_modules[ix][1] = node;
|
|
|
|
if (ix != spec_nodes::M__IMPORT)
|
|
/* Token we recognize when lexing, drop the trailing ' '. */
|
|
node = cpp_lookup (pfile, NODE_NAME (node), NODE_LEN (node) - 1);
|
|
|
|
node->flags |= NODE_MODULE;
|
|
pfile->spec_nodes.n_modules[ix][0] = node;
|
|
}
|
|
}
|
|
}
|