1. Field of the Invention
The present invention relates generally to interactive software engineering tools including editors for source code such as a programming or mark-up language, and more particularly to a language-based editing architecture with support for embedded lexical contexts.
2. Description of the Related Art
Source code plays a major role in most software engineering environments, yet it has always been difficult for software engineers to interact with such code. A common problem encountered by programmers when interacting with source code is that prior art program editors, while often language-sensitive, build upon fragile representations that are ill-suited to the manipulation of multiple embedded syntactic structures, for which behaviors would ideally be defined in accordance with distinct lexical rules. As a result, such prior art program editors are typically limited in their specialization of behaviors and those specialized behaviors provided often exhibit instability in the face of interactive edits. As will now be described, a particularly common instability often results in confusing language-based display (e.g., text formatting or pretty printing, linguistically-driven typography, etc.) and can result in the loss of previous linguistic analysis of program structures being entered.
FIG. 1 depicts a conventional computing system 10 with which a programmer or other user may enter or edit source code. System 10 includes a computer system 20 having at least one CPU 30 and memory (MEM) 40 contained therein. Memory 40 typically includes both volatile and non-volatile memory or storage. A portion of the volatile memory is typically used as a text buffer 50. The programmer or other user enters source code into computer system 20, typically using a keyboard 60 and often augmented by use of a mouse or trackball 70. In general, text buffer 50 is used to represent information, sometimes referred to as a data model.
Typically, software executable on CPU 30 provides functionality of an editor environment, including display and other functionality appropriate to a particular language context for which the software is intended. Sometimes, editor or editor-like facilities may be provided within other software environments. For example, editor facilities are commonly provided within integrated software engineering tools or environments, including within source level debuggers, source analyzers, viewers, etc. Furthermore, such editor facilities may be provided or embedded within other types of systems, e.g., to support scripting or macro language facilities of a word processing, publishing, spreadsheet or other application. In each case, editor facilities provide display and/or rendering functionality, which is often implemented as software executable by computer system 20, and which displays or renders characters, symbols or graphics corresponding to the information represented in text buffer 50.
Often, display functionality renders information represented in text buffer 50 accordingly to stylistic rules appropriate for a particular language type (e.g., for a comment, string literal or tag). For example, portions of text may be rendered as display 90 on monitor 80 using particular typefaces, font sizes, colors and/or attributes that are appropriate or conventional for a particular language type. In general, such functionality operates on contents of text buffer 50 and applies what is believed, rightly or wrongly, to be the appropriate stylistic rules. Unfortunately, associations between particular contents of text buffer 50 and appropriate stylistic rules are typically quite fragile, particularly in the presence of editing operations. Accordingly, if provided, language-based display (e.g., text formatting or pretty printing, linguistically-driven typography, etc.) is somewhat unstable in prior art editor designs. This instability will be better understood in the context of the following example.
Referring now to FIG. 2A, assume that a programmer enters the following keystrokes:S=E+“}”;using system 10. Corresponding contents of text buffer 50 are illustrated in FIG. 2A and a corresponding display (e.g., display 90) is rendered to monitor 80 as shown in FIG. 1.
A typical prior art code-oriented editing environment recognizes language constructs of entered text by performing pattern matching or lexical (language) analysis on context of text buffer 50. Referring illustratively to the contents shown in FIG. 2A, a pattern matching or lexical analysis facility recognizes the group 110 of characters or tokens, namely “}”, as a string literal. Having properly recognized string literal from amongst the remaining source language contents of text buffer 50, a prior art editing environment of system 10 may apply special typographical attributes such as special coloring when displaying characters or symbols of the string literal on monitor 80.
Assume now that the programmer wishes to edit the current line so it will eventually read as follows:S=“{”+E+“}”;
To carry out the above edit, the programmer typically repositions cursor 100 on display 90 so that a corresponding insertion point exists before the existing character E, then begins typing the additional text (namely, the characters “{” +) using keyboard 60. Underlining is used herein as a notational convention to better delimit the relevant characters. The initial double quote character “ is the first such character entered and FIG. 2B illustrates the state of text buffer 50 after its entry, but before remaining characters of the desired edit are entered. The illustrated state highlights the fragility of many prior art editor implementations. Typically, such editor systems are unable to properly handle display of text buffer 50 contents after entry of an initial double quote character “ that is intended by the programmer to signify start of a new string literal.
Instead, given the text buffer 50 state illustrated in FIG. 2B, a pattern matcher or lexical analyzer of the editor typically re-analyzes the text buffer contents and erroneously assumes that portion 120 of the buffer contents, namely “E+”, itself corresponds to a string literal consisting of the characters E+, where underlining is used herein to better delimit the relevant characters. Accordingly, upon programmer entry of the triggering keystroke “, such an editor makes the erroneous assumption (relative to the programmer's intent) that “E+” is itself a string literal and modifies typographic attributes of display 90 accordingly. For example, the supposed string literal “E+” may be rendered using a fixed point font, using a color and size that have been predefined for string literal rendering. Depending on the implementation, inappropriate visual cues may extend to other portions of text buffer 50 contents. For example, since it no longer appears to be preceded by an opening double quote character “, line portion 130 (namely, the right brace character }) may be improperly interpreted as a code construct, rather than as the contents of a string literal intended by the programmer. Indeed, in an editor implementation that performs lexical analysis or even simple matching of braces, a portion may be interpreted (and visually presented) as an unbalanced brace within a context that requires an opening brace for each closing brace.
Furthermore, because the second occurrence of a double quote character “ is improperly interpreted as a closing double quote character for the string literal “E+”, text buffer 50 contents 140 may also be misinterpreted. For example, a lexical analyzer of the editor may identity the characters “; as an invalid lexeme. In particular, buffer contents 140 may be interpreted as the start of a string literal that lacks a closing double quote character. In some prior art editor implementations, an invalid lexeme may be rendered in such a way, e.g., in red, as to highlight invalidity for the programmer. Unfortunately, such inappropriate visual cues can be quite distracting to the programmer.
As the programmer enters additional keystrokes to complete the desired S=“{”+E+“}”; entry, a typical prior art editor will continue to inappropriately interpret text buffer 50 contents. In the above example, upon completion of keystrokes for entry of the string literal “{”, visual cues return to appropriate values. However, in general, a keystroke-by-keystroke interpretation of a given edit may result in an ever changing (and distracting) set of visual cues such as color, typeface or other typographic attributes. While many prior art editors exhibit such inappropriate behaviors, others may simply forgo otherwise desirable language-based features such as advanced program typography or lexical analysis on a keystroke-by-keystroke basis because of the challenges created by interactive edits. To facilitate introduction and use of such features, language-based techniques are needed which exhibit greater stability in the face of interactive edits.
To some degree, inappropriate behaviors can be avoided using language structure-based editor techniques. So-called structure-based editors use internal representations that are closely related to the tree and graph structures used by compilers and other programming tools. While, structure-based editors can greatly simplify some kinds of language-oriented services, they generally impose the requirement that the programmer edit using structural, rather than textual, commands. For example, entry of a string literal may require a structural command such as “Insert String Literal” which may be selected from a pull-down menu, bound to a key stroke sequence, such as a control or escape sequence, or invoked by some other non-lexical trigger. In general, such an editing architecture assumes that programs are intrinsically tree structured, and that programmers understand and should manipulate them accordingly. In practice, this assumption has not been borne out and structure editors have not found wide acceptance.
Some structure-based editors allow uses to “escape” structural constraints by transforming selected tree regions into plain text, but usability problems persist. The complex unseen relationships between textual display and internal representation make editing operations confusing and somewhat unpredictable because of “hidden state.” In some ways, textual escapes make matters worse with a confusing and distracting distinction between those part of a program where language-based services are provided and those where they are not. Often, language services and tools stop working until all textual regions are syntactically correct and transformed back to structure.
Unfortunately, due in large measure to practical user acceptance and deeply ingrained motor learning habits that involve textual, rather than structural editing, practical code-oriented text editors emphasize a textual representation. One widely adopted code-oriented text editor, the Emacs editor, uses a purely textual representation, assisted by ad-hoc regular expression matching that can recognize certain language constructs. But, by definition, the structural information computed by simple text editors is incomplete and imprecise, and such editors cannot support services that require true linguistic analysis such as advanced program typography. At best, simple text editors typically provide indentation, syntax highlighting, and navigational services that can tolerate structural inaccuracy. Although high quality, linguistically-driven typography can measurably improve the programmer's reading comprehension, such typography is often lacking in prior art source code editors, especially when encountering malformed and fragmentary program code. Although a few text editors can perform per-line lexical analysis with each keystroke, the absence of true program representation leads to confusion in the inevitable presence of mismatched string quotes and comment delimiters.
In view of the above, techniques are desired whereby interactive software engineering tools may reliably implement behaviors including advanced program typography in accordance with a proper lexical context. In particular, techniques are desired that facilitate stable language-oriented representations in the presence of interactive edits, but which do not force a user to enter structural commands.