1. Field of Invention
This invention relates to the field of context sensitive text displays. In particular, this invention relates to determining the display characteristics of text efficiently from the parse node containing the text in a parse derived from the text.
2. Description of Background Art
Computers have long been used to display text to humans to facilitate editing and understanding. Typically, a human using a computer to manipulate or view a block of text. For purposes of illustration, a block of text will be assumed to be simply an array of characters.
In a modern computer text editor or text display tool, displaying text on a computer screen involves several steps. First, the tool needs to identify where in the text array the display should begin. This amounts to having an integer that indicates the starting position in the array. This information can be provided to the tool from the human being by keystrokes indicating a line number or a scroll bar indicating the relative position in the file. Next, the tool needs to "paint" the screen with the characters. Conceptually, the tool gets a character at the indexed point, determines the screen display characteristics, such as color, font, size, or format, associated with that character at that point in the text array, and provides that information to the graphical display hardware. The computer then increments the index and repeats the process with the next character.
Conventional text editors require the user to specify the display characteristics associated with particular points in the text without regard to what the particular text is or "means." However, in certain applications, the text may have context that the computer could use to determine the display characteristics. For example, computer programs are written in computer languages like C, Pascal, or Cobol. A program written in computer language has text strings within it that have a particular meaning or significance, such as keywords like "if", "else", or "while". A more sophisticated text display tool could use this context-sensitive information within the text to define the display characteristics of the text. Borland's C++ development environment for DOS machines provides a text editor that, for example, displays keywords with one color and comments in another color.
Using the context of the text to determine the display characteristics presents computational performance problems. The user wants to see the text displayed on the computer screen displayed promptly. However, setting the display characteristics as a function of the context requires that the computer compute the context. That computation can be done at the time the display is requested, or, alternatively, the context computation could be performed at some other, earlier time and the result stored. At the time the display is requested, the computer has to perform some computation either directly or by finding a previous result. This computation takes time. If this time is too long, then the text will not be displayed promptly, thus frustrating the user. Economically viable context-sensitive text displays therefore must efficiently store and compute the context.
A review of the process of extracting context out of certain kinds of text will clarify the problems a context-sensitive display has. One place that context extraction occurs is in the process of generating executable machine code from the text of a computer program written in a high level language. This process is called compilation. Compilation involves several steps: lexical analysis, parsing, code generation, and optimization. Only lexical analysis and parsing are relevant for purposes of displaying context sensitive information Lexical analysis involves breaking the text into distinct, non-overlapping text strings in accordance with the rules of a specified language. These non-overlapping objects are referred to as tokens. The lexical analyzer can characterize some of these text strings because the text matches a keyword, or is a number or a symbol such as "&". Lexical analysis is computationally fast because it relies only on "local" information to determine where the next token begins and ends.
After the lexical analyzer breaks the text into tokens, a parser converts the entire sequence of tokens into a parse tree that describes the computer program's structure. From this parse tree, the code generator can create blocks of executable code corresponding to the nodes of the parse tree. The code generator can then link the blocks together to make an executable program. An optimizer can then look for improvements to make in each block to reduce the amount of memory required to store the executable code or to decrease the run time of the executable code. The nature and mathematical structure of compilers is described in Compilers Principles, Techniques and Tools, by Alfred Aho, Ravi Sethi, and Jeffrey Ullman, 1986, ISBN 0-201-10088-6 which is hereby incorporated by reference. The steps for building a compiler are shown in Introduction to Compiler Construction with UNIX by Axel T. Schreiner and H. George Friedman, Jr., published by Prentice-Hall, 1985, ISBN 0-13-474396-2, which is hereby incorporated by reference.
If it is only desired to modify the display characteristics of the text depending solely on information that can be obtained "locally" from a lexical analyzer, then a text display tool can perform lexical analysis at the time it scans the text for the display. For common computer languages, this would allow the display tool to paint different keywords in different colors without a noticeable performance problem. However, this may not produce enough information for the user.
Changing the display characteristics of the text in accordance with information associated with the corresponding parts of a parse tree can be very useful. However, because constructing a parse tree requires processing all of the text, a tool using the parse tree to determine the display characteristics of the text will probably not be able to compute the entire parse tree every time the display needs to be changed. Therefore, such a display tool will generally require that the parse tree be generated once and stored, and then accessed when the need arises. However, conventional parse trees occupy significant amounts of memory, and require a substantial amount of time to build.
The memory requirements of a conventional parse tree are straight-forward to compute. Consider a text block consisting of N characters. Experience has shown that a file of N characters will generate a parse tree with N/4 nodes. A conventional parse tree which has nodes with a data structure written in C could be declared as follows:
______________________________________ typedef tree.sub.-- nod.sub.-- struct { struct pt.sub.-- node.sub.-- struct *parent, **children; int child.sub.-- count; char *start.sub.-- position, *end.sub.-- position; int parse.sub.-- tree.sub.-- id; /* other stuff*/ } tree.sub.-- node.sub.-- struct, *pt.sub.-- node; ______________________________________
This data structure provides a pointer to the parent node, a pointer to an array of pointers to the children of the node, an integer for counting the number of children, pointers for indicating where, in the text block, the text that generated this node begins and ends, and an integer to indicate what number node this is. In a conventional, 32 bit machine, each node will require one word for the parent pointer, one word to point at the array of children, one word for the child.sub.-- count, one word each for start.sub.-- position and end.sub.-- position, and one word for parse.sub.-- tree.sub.-- id. In addition, each node will require an additional word in its parent's array of children. Therefore, each parse node will require 7 words of memory. Therefore, this conventional tree structure will require 7N/4 words to store the parse tree for an N character file. In a conventional machine, there are 4 bytes per word, thus making the conventional data structure consume 7N bytes.