The problem of entering text into computing devices (e.g., cellular phones, tablets, laptops, smart watches, smart glasses, and the like) is of specific importance in many fundamental applications including, but not limited to, document production (e.g., composing an email, writing a report or a note), text-messaging (e.g., SMS, MMS, IM, chats), and wireless internet browsing.
Current solutions that address the above mentioned problem may be broadly classified into two types: (a) text prediction and (b) speech to text. The solutions utilizing text prediction come in two different flavors. In one, word choices are predicted and displayed as soon as the user types the partial letters of the desired word; the first choice is normally the letters typed and is followed by several other choices for the user to select either by touching the choice displayed (on a touch device) or by pressing the space bar. An example of this form of text prediction is the well-known xt9® software, sold by Nuance Communications, Inc. of Burlington, Mass., that comes pre-built in several of current day phones. In the xt9® software, a user types the letters “aqe” and notices that the word “awesome” is listed as a choice within a list of 3 or more word choices. In the second form of text prediction, referred to as auto correction, a user simply types out the entire word and the auto correction based solution displays choices that it thinks are correct; the desired choice may be automatically inserted by touching it or using the space-bar. For example on some popular smart phones, a user types “aqesome” and sees the corrected word “awesome” being inserted upon pressing the space-bar. Although the basic concept of text prediction involves predicting words using partially or completely typed ambiguous letters, several enhancements exist based on using varied keyboard designs, using gesture based typing (e.g., swype), using handwriting recognition, and so on.
One of the primary drawbacks with text prediction is the seemingly awkward user interface; a user needs to type a couple of letters, then lift their head, scroll their eyes, or in some manner change their focus in order to check if the desired word is accurately displayed, before moving to the next word. Although the number of keys pressed is significantly reduced, the requirement to lift their head, scroll their eyes or otherwise change focus disturbs the user's flow in text composition, thus resulting in a fairly sub-optimal user experience. The problem could be solved if text prediction could be accurate enough to display only one word choice with nearly 100% accuracy. Unfortunately, even with the most sophisticated statistical n-gram models, it is almost impossible to accurately predict a word from the myriad of choices using only word statistics and context.
Text entry using speech-to-text is a whole different alternative wherein a user presses a “speak” button and speaks a phrase. The computing device then converts the user's speech to text and displays the transcription for further action by the user. There has been limited adoption of speech-to-text because of the technology's inherent lack of robustness and accuracy; primarily due to problems associated with large vocabulary sizes, varied accents, varied pronunciations, mismatched language models, background noise, channel noise, and the like.
Additionally, the user interface associated with speech-to-text is seemingly different from the widely adopted keyboard interface. Specifically, to use speech-to-text, a user has to think of a whole phrase in mind, then press a “speak” button, and subsequently speak it out loud in one go. This is well suited when a user is trying to access information via the internet, such as when using Siri® voice recognition software in the iPhone® mobile digital device manufactured by Apple Inc. Unfortunately this interface is not natural when it comes to composing long text documents. As a result, the market lacks mainstream adoption of speech-to-text except in hands-busy situations like driving a car or when used as an accessible technology by the handicapped and disabled.
Multimodal solutions which combine speech and typing have been proposed several times in literature. The inventors have determined that all the above described approaches, even though they have significant merits, they have failed to render the widely adopted “qwerty keyboard” obsolete. In fact, the qwerty keyboard continues to dominate as the most preferred input method.