This specification relates to emulating source code compilers.
Emulating compilers has a number of useful applications one of which is static analysis of source code. Static analysis refers to techniques for analyzing computer software source code without executing the source code as a computer software program.
Source code in a code base is typically compiled in a build environment containing a build system. The build environment includes an operating system; a file system; executable files, e.g., compilers; environment variables, e.g., variables that indicate a path to file system directories that contain library files or executable files; and other configuration files for building source code in the code base.
Many compilers have a preprocessor that runs before the compiler is called. Preprocessors can make arbitrary textual substitutions in existing source code files before the compiler is called to compile the modified source code. Preprocessors can also generate temporary source code files that are compiled but then deleted by the build system when compilation is complete.
The behavior of most compilers is significantly influenced by configuration properties of the compilers. Configuration properties of a compiler include both extrinsic configuration properties, e.g., command line flags passed to the compiler by a build system, as well as inherent configuration properties of the compiler version. Inherent configuration properties of a compiler include built-in search paths, built-in types, built-in macros, and built-in functions, all of which influence the behavior of a compiler and all of which can vary by compiler version and by underlying operating system. In addition, extrinsic configuration properties like command line flags can alter inherent configuration properties of the compiler, e.g., built-in search paths.
Despite detailed language specifications of modern source code languages, there still exist many valid source code constructs whose implementation is defined by and specific to the compiler being used. For example, the C++ standard specifies that a preprocessing directive of the form:
#include <h-char-sequence> new-line
directs a compiler to search for a sequence of locations for a header identified by the characters within the < and > delimiters. The sequence of locations that the compiler will search are implementation-specific. Thus, different C++ compilers may search for the identified header in different places, which can result in different header definitions being imported into the code during compilation.
As another example, the behavior of some preprocessing directives depends on the state of the preprocessor, e.g., whether or not the preprocessor has a particular built-in macro. For example, in the following example segment of source code, both the type of the variable “x” and its initial value depend on whether or not the preprocessor of the compiler being used has the built-in macro “_MSC_VER”:
#ifdef_MSC_VERint x = 1;#elsefloat x = 2;#endif
In some situations source code will compile for some compilers but not for others. This is the case when source code calls a built-in function that is defined by one compiler but not by another. For example, the following source code will typically compile for compilers that have the function “_builtin_bswap64” defined, e.g., GCC, but will fail for compilers that do not, e.g., Microsoft Visual C++:
x=_builtin_bswap64(x)