Digital computers with visual displays and user input devices are widely used to create text-based electronic documents such as e-mail messages and letters. Text is usually entered by the use of a keyboard attached to a personal computer, but may also be entered by means such as a touch sensitive display screen or a microphone combined with speech recognition software. A software application receives and processes the text, which may involve formatting, storage, and transmission of the accumulated entered text as directed by a user. These applications, typically called word processors, provide a digital means for a person to engage in the process of writing.
The writing process requires significant exercise of the user's intellect to decide what concepts to express, to express those concepts in grammatically-correct sentences using appropriate words, to physically enter those sentences into the computer, and to review and edit the entered text. It is a complex and time-consuming process for many. One challenge is that entry and editing by keyboard requires skill to hit the correct keys quickly in the correct order. Another challenge facing a writer is that the entry of text representing complex thought can be time consuming and frustrating, particularly with small systems using a small keyboard or touch screen. The user interface of the computer, which is managed by software receiving the input text, can substantially affect the speed of text entry and the quality of the text entered in many ways.
Interfaces have been devised to increase the speed and quality of entry in various ways such as by checking the spelling of words and grammar, and suggesting or automatically making corrections. Such capabilities may improve the quality of the text with respect to spelling and grammar, but do not assist a user in selecting an appropriate word for use in a particular context.
Systems that predict words based on partial word entry have been developed. These systems typically rely on word lists, knowledge of properties of the language being used, and information on how that language is normally used. Some systems use information about the frequency of use of words and the probability that a particular word will follow one or more other particular words in a sentence. Such systems typically either display their best prediction in a manner completing the current word being entered on the screen, giving the user a means to accept the suggested word, or allowing the user to type over it. Alternatively, they may display a list of several suggested words from which the user can choose one to complete the word being entered.
The effectiveness of such word prediction systems depends primarily on how often the intended word is displayed to the user, particularly where few or no letters of the word have been entered by the user. Basic word prediction systems, such as those based only on word lists, are likely to suggest words that are obviously inappropriate because the systems have no appreciation of the context. A suggested word may be grammatically incorrect, or may have no relationship to the subject matter of the text. This has led to various incremental improvements, such as evaluating the grammar and restricting suggestions to those that may be grammatically applicable (as in Morris C, et al. “Syntax PAL: a system to improve the written syntax of language-impaired users.” Assist. Technol. 1992; 4(2):51-9.), and using multiple prediction techniques and then choosing one determined to be best (as in U.S. Pat. No. 5,805,911).
The probability that correct words will be suggested by a word prediction system can be increased by basing the list of possible words on the topic the user is writing about. Topical areas generally have differing vocabularies, and the frequency of use of particular words varies by topical area. For example, if a user is writing about baseball and the user enters the letters “ba” into an interface, it is more likely the user is writing the words “bat,” “base,” or “ball” than “bath” or “baby” given the topic, even if the latter words are more common in general usage. Some systems have attempted to use pre-defined topic word lists that may be customized by the user and selected for use by the prediction software. Some systems automatically select topic words, or require a user to manually identify topic words, from a document that the user identifies as topical. A problem with such systems is that they have a limited number of topic word sets, and there may not be an appropriate set for the user to select. The user may be left with choosing an inappropriate topic, with the result that the system will suggest inappropriate words that are unhelpful to the user.
When a user is writing about an unfamiliar topic, the user may not have the knowledge or the vocabulary to express in writing the user's thoughts. Systems that merely attempt to complete partially entered words do not assist users in identifying a suitable word to use in the context where the user may be unaware of the most suitable word, or its use had not occurred to the user. This may be a significant deficiency when the user is not very familiar with the topic the user is writing about, which can happen in many situations.
Approaches have been developed for the automatic extraction of keywords from sets of documents, generally in the context of document categorization and retrieval systems. Such systems may also assist in determining the best search words to use when searching a set of documents or the internet for information related to a particular topic. For example, U.S. Pat. No. 5,987,460 defines a method and system to extract and display keywords that operates on sets of documents that have been pre-selected to relate to a particular topic. Such a system would be of limited assistance to a user writing a document in selecting an appropriate word to use as it generates only a limited set of keywords for the purpose of refining a search.