Computer program editors are specialized editors that allow computer program authors, or programmers, to create and modify computer programs. In addition to supporting basic word processing functions such as block copy/move and string find/replace, most computer program editors provide additional editing features that are especially valuable to the programmer. Such additional features include highlighting important structural constructs (e.g., condition statements, keywords, brackets, etc.) and providing language-sensitive commands for navigating, modifying and reformatting or "prettyprinting" programs. While all existing program editors try to provide similar functions, they fall into one of two classes based on how they internally represent the program being edited: as streams of text (text-oriented editors) or as syntax trees (structure-oriented editors).
The most common design approach for program editors is to begin with a text editor and add the aforementioned useful features based on linguistic (or structural) knowledge of the program being entered. This type of editor internally represents the program being edited as a "flat", text file, in which the editable characters predominantly have a one-to-one correspondence to keystrokes made by the program author. In such a text editor, the display of the program also has a one-to-one correspondence to the user's keystrokes. Thus, if the user types a carriage return to break a line of text or begin a new line, a corresponding carriage return/line feed is entered into the text file and displayed on the monitor. Similarly, if a user of a text-based editor hits the spacebar twice in a row, the editor enters two editable spaces into the text file, which are both displayed on the editing screen. This is the approach taken by GNU Emacs, which provides a multitude of "modes" that specialize the behavior of Emacs for programs entered in various programming languages. This is also the approach taken by "vi" another common editor, which provides fewer language-specialized services than Emacs. Finally, most PC-based program editors use this text-based approach.
The chief advantage of text-oriented editors is that their use is familiar. That is, they provide all users with free access to the conventional text editing functionality to which they are accustomed; this gives users great flexibility and minimizes learning costs. However, most text editors do not support full linguistic analysis and therefore have only incomplete and unreliable information about the linguistic structure of programs being edited. For example, many language-based services in Emacs discover structure by matching regular expressions, which by definition cannot capture full linguistic information. These weak-analysis approaches have very little to offer in the way of error diagnosis. Moreover, text editors are not up to the task of providing enough structure to support robust, on-the-fly prettyprinting of the program being entered.
More aggressive text-oriented program editors have been proposed that maintain both full textual and full tree representations of the program being entered. See, for example, the inventor's doctoral dissertation, "User Interaction in Language-Based Editing Systems," UCB/CSD-93-726, Ph.D. Dissertation, Computer Science Division, EECS, University of California, Berkeley, December 1992. This thesis sets out a research system in this category, but there are no known commercial versions. Such aggressive text-editors exact a very high engineering overhead because of the need to build a mapping between related parts of the two representations (i.e., which part of the text corresponds to which part of the structure tree) and to maintain that mapping in the presence of changes as the user types. More importantly, these systems share the fault with structure editors that they provide no useful language-oriented services for those parts of the program that are in the midst of being composed or edited (newly typed text must be analyzed before the system knows anything about it) and those services are of very little use in the presence of syntax errors.
An alternative approach has been explored by a series of research systems under the general category of "structure editors". Two principles are central to this approach: (1) programs are represented internally as syntax trees, and (2) the user's primary mode of interaction with the program is assumed to be in terms of that underlying structure. See, for example, Tim Teitelbaum and Thomas Reps, "The Cornell Program Synthesizer: A Syntax-Directed Programming Environment, Communication of the ACM 24, 9 (September 1981), 563-573 for an early research statement. The commercial descendent of that system is described in: Thomas Reps and Tim Teitelbaum, The Synthesizer Generator Reference Manual, Springer Verlag, Berlin, 1989, Third edition. All practical systems of this sort (for programs) are actually hybrids, meaning that they permit ordinary textual editing under some circumstances. Even so, those circumstances are still expressed in terms of the underlying structure. For example, the user might select a statement structurally, implicitly asking that the structure be treated temporarily as text, and then edit the selected text. When the user thinks editing is complete, the system converts the revised text back into structure if it can. The advantage of this approach is that the editor has access to complete, reliable linguistic information in order to drive services such as prettyprinting.
Structure editors have several failings. First, they restrict the freedom to edit textually that users expect; experience shows that this has severely limited the acceptance of these editors. Second, they provide no useful language-oriented services for those parts of the program being edited textually; those services cannot be provided until the user has eliminated all syntax errors in parts being edited and the system has analyzed the text in order to restore the tree structure. Thus, while the user is editing the editor offers no help at all in that region. Finally, structure-editors typically provide very little support in the presence of syntax errors, since most of the language-based services they provide require a well-formed syntax tree in order to operate. Of course, while the user is editing textually, the program is syntactically ill-formed most of the time; it is therefore next to impossible to maintain a well-formed syntax tree of the program as it is being entered.
Thus, there is a need for a program editor that provides the advantages of both types of program editors: the full freedom of textual editing with first class, reliable structural information available all of the time (not just for parts of the program and not just when no errors are present). More specifically there is a need for a program editor that provides a single, non-textual, internal program representation for most services (e.g., language support, prettyprinting, etc.), which is also suitable for programs in progress; i.e., programs which are syntactically ill-formed and contain program words that are either lexically ill-formed or incomplete versions of legal lexemes. This internal program representation should represent the words of the program, their absolute positions, and their extended lexical type, including whether a word is an ill-formed or incomplete lexeme.
It would be desirable for such an editor to be capable of maintaining this internal representation on the fly, as the user types, even in the presence of the inevitable program syntax errors. This editor should also be able to prettyprint the program being entered based on the aforementioned internal representation of the program, as the user types, where prettyprinting involves (1) typesetting each token based on the aforementioned structural information, including whether a particular program is ill-formed or illegal and (2) displaying varying amounts of whitespace between program words determined, not on spaces entered by the user, but by aesthetic and ergonomic rules based on the particular context of the displayed program words adjacent to the whitespace.
A similar goal has been pursued in the realm of natural language, WYSIWYG (What You See Is What You Get) processors, which typeset the document being edited on the fly, as the user types. However, many of these systems simply treat inter-word space in the conventional way, as "whitespace characters" that the user may enter and manipulate exactly as any other characters. Other systems offer an abstraction of this behavior (e.g., the "smart spaces" option in FrameMaker) by allowing a user to enter no more than one "space" between words; in this case, the space is not a true character (and in fact the visual whitespace between words may vary as typesetting is performed) , but behaves conceptually as a "word separator". However, this latter approach is not applicable to program editing, where the lexical classes of the program "words" dictate that displayed whitespace has a variety of internal representations depending on the editing context. Thus, it would also be desirable for a program editor adopting such an approach to let the user edit the non-textually represented program as if it were text via a cursor whose behavior is similar to that found in a text editor but which also conveys the editing context of the cursor, the program words, and whitespace adjacent to the cursor.
Such editor should provide an optional, full program structural analysis capability (or parser) that adds useful program interpretation information that cannot be gleaned from the internal representation alone. This information should represent the syntax of the program being entered; however, as this full program analysis capability could not generate useful information in the presence of syntax errors, it should not be allowed to interfere with the user's editing, and should only be invoked when desired by the user.
Finally, such editor should permit a user to force horizontal token alignment among associated lines of text for ease of prettyprint viewing the program under edit on the program editor display. Use of such alignment should be intuitive and require little or no additional overhead to implement.