The present invention relates to data processing techniques for editing natural language text. More specifically, the invention relates to editing techniques that take punctuation into account.
A number of conventional techniques relating to punctuation of natural language text are discussed in copending, coassigned U.S. patent application Ser. No. 07/274,158 filed Nov. 21, 1988, entitled "Processing Natural Language Text Using Autonomous Punctuational Structure" ("the Trollope application"), and incorporated herein by reference.
Text Editing, VP Series Reference Library, Version 1.0, Xerox Corporation, 1985, pp. 47-56, describes features of the ViewPoint Document Document Editor available from Xerox Corporation. Pages 49-52 described the multiple clicking method of selection, in which the number of mouse button clicks indicates the desired unit of text, with one click selecting a character, two clicks selecting a word, three clicks selecting a sentence, and four clicks selecting a paragraph. The editor uses special rules to interpret text as words or sentences. As described at page 49, a selection of a character, word, sentence, or paragraph by the multiple clicking method clicking method can be extended by a select adjust method to includes additional characters, words, sentences, or paragraphs, respectively. Move, copy, and delete operations can be applied to the selection. In the case of move or copy operations, a selection by the multiple clicking method will be positioned between other text units of the same level, so that a word will be positioned between words, a sentence between sentences, and a paragraph between paragraphs. A selection by the multiple clicking method also includes the preceding or following space or spaces; therefore, a word can be moved or deleted from a sentence, for example, leaving the remaining words and punctuation marks in the sentence correctly spaced; similarly, a sentence can be moved or deleted from a paragraph, leaving the remaining sentences in the paragraph correctly spaced.
Various other commercial products have features similar to those of ViewPoint, including selection commands from the keyboard or with a mouse or similar pointer control device. Conventionally, a single click with a pointer control device button selects a region that starts at the character boundary nearest the position of the pointer at the time of the click. In one approach, the region selected by a single click contains no characters, but may be extended one character at a time by moving the pointer over the characters to be added. In another approach, the region selected by a single click contains one character, and may be extended arbitrarily by a single click of a different button with the pointer at the desired ending point of the selection.
It is also conventional to provide selection by double-clicking, or clicking twice in succession with the pointer at the same position. Double-clicking usually selects the word most closely surrounding the pointer position, and subsequent adjustments of the selection are usually made a word at a time. For example, the MacIntosh personal computer from Apple Corporation provides a user interface in which multiple clicking selects a word. Word, a commercial text editor from Microsoft Corporation, provides extension of such a selection to additional full words. Microsoft Word and other text editors, including WordPerfect from WordPerfect Corporation and Emacs available with source code from Free Software Foundation, Cambridge, Mass., allow selection of a sentence and extension of such a selection to additional full sentences. Microsoft Word and Fullwrite Professional from Ashton-Tate Corporation further allow selection by paragraph. Fullwrite Professional also allows the user to provide a quotation mark without indicating whether it is at the open or close of a quote, the software correctly providing an open or close quotation mark based on previous marks.
Text Editing and Processing, Symbolics, Inc., #999020, July 1986, pp. 24-31 and 63-111, describes text editing features of a version of Emacs called "Zmacs." Pages 67-70 describe mouse operations, including clicking on a word to copy the whole word; on a parenthesis to copy it, its matching parenthesis, and the text between them; on a quotation mark to copy it, its matching quotation mark, and the text between them; or after or before a line to copy the whole line. Appropriate spaces are placed before inserted objects, so that spaces are automatically inserted around an inserted word or sentence. Pages 71-75 describe motion commands, including motion by word, meaning a string of alphanumeric characters; by sentence, ending with a question mark, period, or exclamation point that is followed by a newline or by a space and a newline or another space, with any number of closing characters between the sentence ending punctuation and the white space that follows; and by line, delimited by a newline. Page 79 describes motion by paragraph, delimited by a newline followed by blanks, a blank line, or a page character alone on a line; page 80 describes motion by page, delimited by a page character. Chapter 5, pages 83-97, describes deleting and transposing text, with pages 97-89 describing how contents of a history are retrieved. Chapter 6, pages 99-111, describes working with regions, and discusses point and mark.
Kucera et al., U.S. Pat. No. 4,773,009, describe a text analyzer that analyzes strings of digitally coded text to determine paragraph and sentence boundaries. As shown and described in relation to FIGS. 3-4, each string is broken down into component words. Possible abbreviations are identified and checked against a table of common abbreviations to identify abbreviations that cannot end a sentence. End punctuation and the following string are analyzed to identify the terminal word of a sentence. When sentence boundaries have been determined, a grammar checker, punctuation analyzer, readability analyzer, or other higher-level text processing can be applied.