In 2005, over one trillion text messages were sent by users of mobile phones and similar hand-held devices worldwide. Text messaging usually involves the input of a text message by a sender or user of the hand-held device, wherein the text message is generated by pressing letters, numbers, or other keys on the sender's mobile phone. E-mail enabled devices, such as the Palm Treo or RIM Blackberry, enable users to generate emails quickly, in a similar manner. Further, such devices typically also have the capability of accessing web pages or information on the Internet. Searching for a desired web page is often accomplished by running a search on any of the commercially available search engines, such as google.com, msn.com, yahoo.com, etc.
Unfortunately, because such devices make it so easy to type in a text-based message for a text message, email, or web search, it is quite common for users to attempt to do so when the user of the hand-held device actually needed to focus his attention or hands on another activity, such as driving. Beyond those more capable hand-helds, the vast majority of the market is comprised of devices with small keypads and screens, making text entry even more cumbersome, whether the user is fixed or mobile. In addition, it would be advantageous for visually impaired people to be able to generate a text-based message without having to type in the message into the hand-held device or mobile phone. For these and for many other reasons, there has been a need in the mobile and hand-held device industry for users to be able to dictate a message and have that message converted into text. Such text can then be sent back to the user of the device for sending in a text message, email, or web application. Alternatively, such text message can be used to cause an action to be taken that provides an answer or other information, not just a text version of the audio, back to the user of the device.
Some currently available systems in the field have attempted to address these needs in different ways. For example, one system has used audio telephony channels for transmission of audio information. A drawback to this type of system is that it does not allow for synchronization between visual and voice elements of a given transaction in the user interface on the user's device, which requires the user, for example, to hang up her mobile phone before seeing the recognized results. Other systems have used speaker-dependent or grammar-based systems for conversion of audio into text, which is not ideal because that requires each user to train the system on her device to understand her unique voice or utterances could only be compared to a limited domain of potential words—neither of which is feasible or desirable for most messaging needs or applications. Finally, other systems have attempted to use voice recognition or audio to text software installed locally on the handheld devices. The problem with such systems is that they typically have low accuracy rates because the amount of memory space on hand-held devices necessarily limits the size of the dictionaries that can be loaded therein. In addition, voice recognition software installed on the hand-held typically cannot dynamically morph to handle new web services as they appear, a tremendous benefit of server-based solutions.
Thus, there remains a need in the industry for systems, methods, and thin-client software solutions that enable audio to be captured on a hand-held device, can display text results back in real time or near real time, is speaker independent so that any customer can use it immediately without having to train the software to recognize the specific speech of the user, uses the data channel of the device and its communication systems so that the device user is able to interact with the system without switching context, uses a backend server-based processing system so that it can process free form messages, and also has the ability to expand its capabilities to interact with new use cases/web services in a dynamic way.
Therefore, a number of heretofore unaddressed needs exist in the art to address the aforementioned deficiencies and inadequacies.