The present invention relates generally to compilers and program analyzers, and more particularly to an improved system and method for lexing and parsing computer programs that include tool-specific annotations.
A compiler or a source-level program analyzer is capable of parsing source programs, which are written in a particular programming-language. Compilers generally include a lexer and a parser. Similarly, other types of programming tools include a lexer and parser. The lexer reads the source-level program and generates tokens based upon the programming-language statements in the source-level program. The lexer passes the generated tokens to the parser, which assembles the tokens into an abstract syntax tree (AST). The abstract syntax tree is further processed by one or more tools, such as a compiler back-end or a program correctness tester.
Tool specific annotations are typically used in the source program to give the tools special instructions; for example, xe2x80x9cgenerate the following machine code instruction at this point in the target code,xe2x80x9d xe2x80x9cgenerate code that uses a machine register for this program variable,xe2x80x9d xe2x80x9cignore possible errors of type x in this program statement,xe2x80x9d or xe2x80x9ccheck that this parameter is always a non-zero integer.xe2x80x9d As new tools are devised, and as new features are added to those tools, the lexer and parser used by the tools will often require corresponding revisions.
The present invention addresses the problem of revising the lexer and parser for a programming-language when new tools are created, or new annotation-based features are added to tools. In particular, using the present invention, tool-specific annotations are effectively separated from programming-language-specific statements. Further, the present invention makes it relatively simple to implement a wide range of tool-specific annotations, including annotations that employ a complex programming-language.
Two conventional approaches that allow tool-specific annotations are known. In a first approach, tool-specific annotations are recognized and processed by the lexer. In a second approach, tool-specific annotations are recognized and processed by the parser.
An example of the first conventional approach to supporting tool-specific annotations is the way a xe2x80x9c#line Nxe2x80x9d tool-specific annotation may be handled by a C compiler. There, the C compiler lexer may keep track of the line number information of every token it recognizes. If the C compiler lexer reads the xe2x80x9c#line Nxe2x80x9d annotation, then the C compiler lexer changes an internal counter to N, as if the next line were N, and proceeds to read the next token. Since the lexical structure of the xe2x80x9c#line Nxe2x80x9d is so simple, a standard lexer, such as the C compiler lexer, can recognize the tool-specific annotation.
An example of the second conventional approach to supporting tool-specific annotations in a compiler is the way a compiler for the Modula-3 language handles an xe2x80x9c less than * ASSERT P * greater than xe2x80x9d tool-specific annotation. It is treated as if it were a Modula-3 program statement. Although xe2x80x9cPxe2x80x9d is an expression, it can be parsed appropriately because the annotation is recognized by the Modula-3 parser.
The conventional methods for recognizing tool-specific annotations, while functional, are less than satisfactory in practice. If a new tool (such as a type-checker or an error-checker) is created for a particular programming-language, extensive recoding of the standard programming-language lexer and parser may be required to handle program annotations specific to that tool. Even a simple modification made to the syntax of the annotations used by an existing tool may require extensive modification of the lexer and parser of that tool.
In the system and methods of the present invention, tool-specific annotations are recognized by the lexer for the programming-language, but the lexing and parsing of the tool-specific annotations are handled by a separate, tool-specific annotation processor.
A compiler or other programming tool includes a lexer capable of detecting computer programming-language units present in a character stream. The lexer generates a stream of tokens based upon these units. The lexer is further capable of detecting the units of computer programming-language statements such as identifiers. As the lexer detects tool-specific annotations in the character stream, it passes them to the back-end annotation processor. The back-end annotation processor is designed to lex and parse the annotations for a specific tool (or set of tools). In a system having a plurality of tools that use different tool-specific annotations, the back-end of the system will have a corresponding set of tool-specific annotation processors.
When the back-end annotation processor receives a tool-specific annotation from the lexer, the annotation processor generates an annotation token based upon the tool-specific annotation and returns the annotation token to the lexer. The lexer in turn adds the annotation token to the end of a list of tokens it has generated so far. The lexer passes the mixed stream of tokens, some generated within the lexer, and some generated by the back-end annotation processor, to the parser. The parser assembles the stream of tokens and annotation tokens into an abstract syntax tree and passes the tree to the aforementioned tool. The tool processes the annotation tokens as well as the other tokens in the abstract syntax tree.
In a preferred embodiment, at least one of the annotation processors has the capability of generating an annotation token that includes an abstract syntax tree within the annotation token. The abstract syntax tree within the annotation token may be referred to as a secondary abstract syntax tree and the abstract syntax tree assembled by the parser may be referred to as the primary abstract syntax tree. In this embodiment, the annotation token including a secondary abstract syntax tree is incorporated into the primary abstract tree in a context-sensitive manner by the parser.
In a preferred embodiment, an annotation processor includes an annotation lexer and an annotation parser. Preferably, the annotation lexer is context-free and the annotation parser is context-sensitive.