The problem of entering text into devices having small form factors (like cellular phones, personal digital assistants (PDAs), RIM Blackberry, the Apple iPod, and others) using multimodal interfaces (especially using speech) has existed for a while now. This problem is of specific importance in many practical mobile applications that include text-messaging (short messaging service or SMS, multimedia messaging service or MMS, Email, instant messaging or IM), wireless Internet browsing, and wireless content search.
Although many attempts have been made to address the above problem using “Speech Recognition”, there has been limited practical success. These attempts rely on a push-to-speak configuration to initiate speech recognition. These push-to-speak configurations introduce a change in behavior for the user and reduce the overall through-put, especially when speech is used for input of text in a multimodal configuration. Typically, these configuration require a user to speak after some indicator provided by the system. For example, a user speaks “after” hearing a beep. The push-to-speak configurations also have impulse noise associated with the push of a button, which reduces speech recognition accuracies.