Computer programs are generally written in a high-level programming language (e.g., Java or C). Compilers are then used to translate the instructions of the high-level programming language into machine instructions, which can be executed by a computer. The compilation process is generally divided into 6 phases:                1. Lexical analysis        2. Syntactic analysis        3. Semantic analysis        4. Intermediate code generation        5. Code optimization        6. Final code generation        
During lexical analysis, the source code of the computer program is scanned and components or tokens of the high-level language are identified. The compiler converts the source code into a series of tokens that are processed during syntactic analysis. For example, during lexical analysis, the compiler would identify the statementcTable=1.0;as the variable (cTable), the operator(=), the constant (1.0), and a semicolon. A variable, operator, constant, and semicolon are tokens of the high-level language.
During syntactic analysis (also referred to as “parsing”), the compiler processes the tokens and generates a syntax tree to represent the program based on the syntax (also referred to as “grammar”) of the programming language. A syntax tree is a tree structure in which operators are represented by non-leaf nodes and their operands are represented by child nodes. In the above example, the operator (“=”) has two operands: the variable (cTable) and the constant (1.0). The term “parse tree” and “syntax tree” are used interchangeably in this description to refer to the syntax-based tree generated as a result of syntactic analysis. For example, such a tree may optionally describe the derivation of the syntactic structure of the computer program (e.g., may describe that a certain token is an identifier, which is an expression as defined by the syntax). Syntax-based trees may also be referred to as “concrete syntax trees,” when the derivation of the syntactic structure is included, and as “abstract syntax trees,” when the derivation is not included.
During semantic analysis, the compiler modifies the syntax tree to ensure semantic correctness. For example, if the variable (cTable) is an integer and the constant (1.0) is floating point, then during semantic analysis a floating point to integer conversion would be added to the syntax tree.
During intermediate code generation, code optimization, and final code generation, the compiler generates machine instructions to implement the program represented by the syntax tree. A computer can then execute the machine instructions.
A system has been described for generating and maintaining a computer program represented as an intentional program tree, which is a type of syntax tree. (For example, U.S. Pat. No. 5,790,863 entitled “Method and System for Generating and Displaying a Computer Program” and U.S. Pat. No. 6,097,888 entitled “Method and System for Reducing an Intentional Program Tree Represented by High-Level Computational Constructs,” which are hereby incorporated by reference.) The system provides a mechanism for directly manipulating nodes corresponding to syntactic elements by adding, deleting, and moving the nodes within an intentional program tree. An intentional program tree is one type of “program tree.” A “program tree” is a tree representation of a computer program that includes operator nodes and operand nodes. A program tree may also include inter-node references (i.e., graph structures linking nodes in the tree), such as a reference from a declaration node of an identifier to the node that defines that identifier's type. An abstract syntax tree and a concrete syntax tree are examples of a program tree. Once a program tree is generated, the system performs the steps of semantic analysis, intermediate code generation, code optimization, and final code generation to effect the transformation of the computer program represented by the program tree into executable code.
That system also provides editing facilities. The programmer can issue commands for selecting a portion of a program tree, for placing an insertion point in the program tree, and for selecting a type of node to insert at the insertion point. The system allows various commands to be performed relative to the currently selected portion and the current insertion point. For example, the currently selected portion can be copied or cut to a clipboard. The contents of the clipboard can then be pasted from the clipboard to the current insertion point using a paste command. Also, the system provides various commands (e.g., “Paste=”) to insert a new node (e.g., representing an assignment operator) at the current insertion point.
The system displays the program tree to a programmer by generating a display representation of the program tree. A display representation format specifies the visual representation (e.g., textual) of each type of node that may be inserted in a program tree. The system may support display representation formats for several popular programming languages, such as C, Java, Basic, and Lisp. This permits a programmer to select, and change at any time, the display representation format that the system uses to produce a display representation of a program tree. For example, one programmer can select to view a particular program tree in a C display representation format, and another programmer can select to view the same program tree in a Lisp display representation format. Also, one programmer can switch between a C display representation format and a Lisp display representation format for a program tree.
The system also indicates the currently selected portion of the program tree to a programmer by highlighting the corresponding display representation of the program tree. Similarly, the system indicates the current insertion point to a programmer by displaying an insertion point mark (e.g., “|” or “^”) within the displayed representation. The system also allows the programmer to select a new current portion or re-position the insertion point based on the display representation.
The editing facilities of the system allow insertion of new nodes typically only relative to sibling nodes. For example, a node can be added before or after a selected sibling node. The first child node cannot be added this way, since there are no siblings to select. As a result, the system may automatically add a child node whenever a non-leaf parent node is added to the program tree. For example, when a binary operator node is added to the program tree, the system adds at least one child node as an operand. The type of the child node is “to be determined” because the system did not know the type of operand that the programmer wanted. The system then allows the programmer to change the type of the node. Although the automatic adding of a child node allowed for a child node to be added without any sibling nodes, some programmers would have preferred to have a way to add child nodes without using nodes with a “to be determined” type.
FIG. 1 is a diagram illustrating a portion of a program tree corresponding to the definition of a method. The method is defined by the following:
public static int increment (int i){i++; return i;}
Node 101 corresponds to the root of the sub-tree of the program tree representing the “increment” method. Nodes 102–108 are child nodes of the root node. Nodes 109 and 110 are child nodes of node 106, node 111 is a child node of node 107, and node 112 is a child node of node 108. Each node has a reference to another node in the program tree that defines the node type. For example, node 107 references (e.g., “statement”) a declaration node that defines a statement, and node 111 references (e.g., indicated by the dashed line) node 106 for the parameter “i.” Such referenced nodes are also referred to as declaration definitions.
A node of a certain node type may have a variable number of child nodes. For example, the “increment” method has seven child nodes. The references to child nodes may be stored in an array of references of the parent node. For example, entry 1 of the array may reference the child node of type “name,” entries 2 and 3 may reference the child nodes of type “modifier,” and so on. To identify the child nodes of a parent node with a certain node type, the system, however, would typically need to access each child node of the parent node. It would be desirable if the types of child nodes could be identified without having to access each child node.
The system also needed to track various groupings of node types. For example, the formal parameters of a method may have node types of input parameter, output parameter, or input/output parameter. The system needed to be programmed with knowledge that these three different node types were formal parameters. Thus, whenever the system needed to identify the child nodes representing the formal parameters, it would check each child node of the method node to see if the node type of the child node matched on these three node types. When the system is embedded with such knowledge, the system needed to be modified whenever such groupings changed, whenever new groupings were added, and whenever new node types were added to a group. It would be desirable to avoid such modifications to the system.