There are a wide variety of text input systems that are designed to enable the user to input text at increased rates. The increased rate of text input is often made possible by enabling the user to perform actions with less precision than is required by comparable systems, since it is generally possible to perform a given set of actions more rapidly when the actions can be performed with less precision. In general, this is achieved by defining a lexicon of textual objects that can be generated by the system, along with a basis on which input actions performed by the user are mapped to one or more of these textual objects in the lexicon. The system then analyzes (one or more) input actions performed by the user and determines which textual object is most likely the object intended by the user in performing the input actions, and generates the determined object as the text corresponding to the input actions. One example is the system disclosed by Robinson et al in U.S. Pat. No. 6,801,190, which is based on a virtual keyboard where the user is able to enter text by means of imprecise tapping. Another example is the “ShapeWriter” system (disclosed by Zhai in US patent publication US 2004-0120583 A1) that recognizes word patterns based on the shape of a traced path on a virtual keyboard by comparing them to a library of shape prototypes associated with words. Another example is the system disclosed by Kushler et al in U.S. Pat. No. 7,098,896.
In other systems (for example, speech recognition and handwriting recognition systems), the user is enabled to input text by inputting information in an alternate “modality” (i.e. by speaking in the case of a speech recognition system; by writing cursively in the case of handwriting recognition systems), rather than by performing actions in the same modality but with less precision (as in the case of the virtual keyboard of Robinson et al). In such “alternate modality” systems, the user's input actions (spoken words in the case of speech recognition; cursive text in the case of handwriting recognition) are likewise mapped (“recognized”) as corresponding to one or more textual objects in the system's lexicon. Other systems inherently introduce ambiguity into the text input actions performed by the user, and consequently also need to map input actions performed by the user to one or more textual objects in a lexicon. A well-known example is the input system known commercially as “T9” (disclosed by Grover et al in U.S. Pat. No. 5,818,437) that is commonly used on cellular phone keypads with a limited number of keys, where each key of a subset of the standard phone keys is associated with a plurality of distinct letters, and a key sequence entered by a user is mapped to one or more words whose letters correspond to the letters associated with the keys in the input sequence.
Despite the differences in the nature of the input actions performed by the user in these various systems (hereinafter “input action recognition systems”), and in the manner in which these input actions are mapped to textual objects in the system lexicons, there are a number of characteristics that input action recognition systems generally have in common:
1) The textual objects identified by such input action recognition systems are output by a text presentation system (most commonly implemented as a region of a display device on which the text is displayed) such that the generated text can be further edited by the user.
2) There are instances in which a given input action (or sequence of input actions) is mapped to more than one possible corresponding identified textual object, and the identified textual object determined to most closely correspond to the input action(s) is subsequently output to the text presentation system.
3) The system can maintain a record (at least for a limited number of the most-recently generated textual objects) of one or more of the alternate textual objects also determined to correspond reasonably closely to the input action, and (at least temporarily) associate these alternate textual objects with the textual object actually generated for output. Alternatively or in addition, the system can record certain data or information regarding the input actions and associate this data with the record of the associated alternate textual interpretations, or re-process this recorded data regarding the input actions to identify alternate textual objects at a later time.
4) There are instances in which the identified textual object determined to most closely match the input action(s) that is output through the text presentation system is not the textual object that the user intended to generate, so that the user needs to edit the generated text so that it correctly corresponds with the text that the user intended to generate.
5) There is a text insertion location (or position) within the text presentation system where the next textual object generated by the user will be inserted. This is commonly referred to as the “cursor” (sometimes “caret”) position (hereinafter the “text insertion position ”).
6) There is a text editing action (or “gesture”) by means of which the user can change the text insertion position to a new location in the text presentation system. In the great majority of systems, the text presentation system comprises a text output region on a display. In mouse-based systems, for example, this action is, perhaps universally, a single click of the mouse performed within the text output region. In stylus-based touch-screen systems this action is, again, perhaps universally, a single tap of the stylus performed within the text output region.
7) There are in general two separate classes of characters into which characters processed by the system are classified. One class is comprised of those characters that can be validly used to form one or more of the textual objects generated by the system (hereinafter “textual object characters”). The second class is comprised of one or more characters or types of characters that are treated by the system as delimiter characters that are not contained in the textual objects generated by the system (hereinafter “delimiters”). The class of delimiter characters very commonly includes “white space” characters (space, tab, return, and so forth), and often other punctuation characters.
Another characteristic that tends to be shared by the vast majority of users across all of these disparate input action recognition systems is that users tend to perform input actions faster and faster (and consequently with less and less precision) until a certain number of “errors” start to occur, where an error is an instance in which the textual object that is output by the system to the text presentation system is not the textual object that the user intended to generate. Another characteristic that is common to virtually every text editing system is that any editing action performed by the user (wherein any text present in the text presentation system is modified by a user action) results in the text insertion position being re-located to the location of the edited text. While this behavior made sense (and was also in a sense unavoidable) in the original mouse-and-keyboard model for text editing, in many scenarios in the input action recognition systems described above, this behavior is no longer desirable. In the “text editing” that is necessitated by these scenarios, the user is in general trying to input a particular stream of text. At some point in the process, the user looks at the text output region (or otherwise examines the text presentation system) and notices that the text that has been generated differs in some way from the text that the user intended to generate, due to the fact that one or more of the user's preceding input actions were “incorrectly recognized” by the system so that one or more textual objects other than those intended by the user have appeared somewhere earlier in the text output region. In most cases, the user simply wishes to correct the incorrectly recognized text (in essence, to “re-map” or “re-edit” the system's interpretation of the original input action(s) performed by the user when the textual object was generated), and to continue to input new text at the current text insertion position. However, the problem remains that, with existing systems, it is not possible to edit any incorrectly recognized text without re-locating the text insertion position to the location of the edited text.
Standard desktop computing systems are almost universally equipped with a full-size desktop keyboard and a mouse (or mouse equivalent such as a trackball or a graphic tablet digitizer). As a result, the majority of users are comfortable and relatively efficient using a keyboard and mouse for text input on desktop systems. For portable, handheld devices, the desktop keyboard and mouse are impractical due to their size and the need (in the case of a standard mouse) for a flat, relatively stable surface. Consequently, many of the above input action recognition text input systems were either developed specifically for use on portable, handheld devices, or are often viewed as being particularly useful when used on such devices.
Portable computing devices continue to get more powerful and more useful. The touch-screen has proved to be a very useful, flexible and easy-to-use interface for portable devices. The touch-screen interface is used on a wide variety of portable devices, including larger devices such as Tablet PCs, but it has been found to be particularly effective on smaller devices such as PDA's and mobile phones. The development of such devices has largely been focused on two conflicting goals: one is making the devices smaller, and another is making them easier, faster and more convenient to use.
One user interface element that is commonly used in a wide variety of systems is to present a menu of choices to the user to allow the user to select a desired response among the alternatives presented. This user interface element is frequently used in the above input action recognition text input systems, since it is often necessary to present the user with a list of possible alternative textual interpretations of one or more input actions, and allow the user to select the correct interpretation that reflects the user's intention in performing the input actions. In a system based on a touch-screen interface, the most natural way to interact with an on-screen menu is to select the desired option simply by selecting it by contacting the desired menu selection with a stylus or a finger. It is often desirable to minimize the amount of display area required to display the menu, so that other elements on the display are not obscured. On the other hand, given that the user selects the desired menu option by touching it, the smaller the menu is the more precise the user must be in a selection action, and therefore the more difficult the menu is to use.
Thus, there is a natural tension in these usability aspects. A second similar consideration arises from the fact that another often desired design goal is to enable the user to use a touch-screen system with a finger, rather than requiring the use of a stylus or other specialized instrument for interacting with the screen. This creates the same tension, since a fingertip is generally less precise than a stylus, so designing a menu such that selections can be performed with the user's finger generally requires making the displayed menu substantially larger.
As mentioned above, in the prior art, making a selection from a touch-screen menu has required that the user directly contact the desired selection. In some implementations, the user is required to control the placement of the stylus such that the first contact of the screen by the stylus (or finger) occurs within the region associated with the desired selection. In other approaches, the user is allowed to initially contact the screen within any of the set of active selection regions, then slide the stylus until it is within the region associated with the desired selection (without breaking contact with the screen) before lifting the stylus. In the first approach, the initial contact location must be carefully controlled to achieve the desired selection, and in the second approach the final contact location must be similarly controlled. Since each menu selection item is represented as a given two-dimensional region of the displayed menu, in either approach the user is required to control the placement of the screen contact in two dimensions in order to effect the desired menu selection. The smaller each two-dimensional menu region is, the more precise the user must be in this contact action, and, in general, the more time will be required to make a menu selection.