The present invention is more particularly useful where the programming language has a relatively loosely integrated macro processor. While the invention is not restricted to any specific language, it will be useful in the present explanation to use the C language as an illustratory example of a language having such a loosely integrated macro processor.
One well-known shortcoming of the C programming language is the poor integration of its macro preprocessor with the C grammar. This presents many problems to C language tools such as structure editors, view-oriented program browsers, or program transformation tools that try to integrate the syntax and semantics of the preprocessor directives within the underlying program structure.
Program databases, used in integrated programming support environments, must contain all of the original information found in the program text in order to present complete structural information to a programmer. In many of these environments, the syntax and semantics of programs are commonly represented with attributed abstract syntax trees (AST's). AST's are easy to generate and manipulate and conveniently reflect the structure of the programs they represent. However, for the case of C, AST's are not sufficient. The C preprocessor (cpp), which provides conditional compilation and macro substitution, supports features that cannot be described by a tree-structured representation and are not amenable to existing parsing techniques. For instance, a program containing a single #if is actually two programs: one where the controlling expression is true and the other where it is false. Much information is removed by cpp that the parser has no chance of recovering. However, even if the parser had access to this information, it would be difficult to concisely represent these two views of the program with a single AST. Thus a forest of AST's or, more compactly, a single abstract syntax graph (ASG) is needed for the representation. In most C language compilers and tools, programs are first processed by a macro preprocessor. The resulting text is then passed through a lexical and syntax analysis phase. In many respects, this preprocessor has contributed to C's power as a systems programming language. The following list of preprocessor directives points out the power of the preprocessor and also illustrates some of the resulting difficulties posed by them:
#define name token-string
Replace subsequent instances of name with token-string.
#define name(arg [,arg]. . . ) token-string
Define a parameterized macro. Macro definitions include a macro body and, optionally, a sequence of formal parameters. A macro body definition is an arbitrary sequence of tokens. Within a macro body, parameters are replaced by the actual arguments of the macro at the use site.
#include "file name"
Include the text of one file within another file. This directive is most often used to import interface information from other modules.
#if constant-expression
Conditionally include or exclude selected portions of text from a source file. The #if directive evaluates the constant expression and includes the text of its body up to the matching #else, #elif, or #endif, if the expression is true. These features are widely used to construct modules that are portable across different operating systems and machine architectures. This feature also allows the conditional inclusion of various features within a module.
#ifdef name
Similar to the #if statement, except that the intervening text between the next #else, #elif, or #endif is included if name has already been defined.
Various problems arise with the integration of the preprocessor with the syntax analysis phase. For example, the preprocessor's grammar is not integrated with the C grammar and since the preprocessor tokens can appear anyplace in a string of C program tokens, the two grammars cannot be combined. Furthermore, conditional compilation directives induce multiple parse trees, or versions, of the same module. Ideally, the syntactic and semantic information of these induced versions should be represented in one integrated program representation structure. In an integrated program representation structure, macro uses should be accessible simply as macros, similar in syntax to identifiers or function calls, as well as in their expanded forms. Another problem arises because the body of a macro definition is not required to be a complete syntactic unit in the underlying C grammar. Also, the text of an included file presents difficulties because the text is not required to be syntactically complete, thus a syntactic unit can span multiple files.