The use of mobile telephones is becoming more prevalent. More people now carry mobile telephones wherever they go; they expect a signal in most locations and expect to be able to communicate with others and to receive information virtually anytime and anywhere. But, the current technological climate hampers the ability of a person to send and receive information.
U.S. patent application Ser. No. 11/368,840 filed Mar. 6, 2006 entitled “Message Transcription, Voice Query and Query Delivery System” presents techniques for converting speech to text and is hereby incorporated by reference. Nevertheless, there is room for improvement.
Currently, speech recognition software requires that each user have a custom user profile. These user profiles are distributed in the sense that a user must have numerous user profiles if he or she uses different speech recognition software. (For example, while the DRAGON brand software from Nuance Corporation might be used on an IBM-compatible computer, it cannot be used on a computer from Apple Inc., so the user may choose the ILISTEN brand software available from MacSpeech, Inc. for use on an Apple computer.) Further, even if the user always uses a single brand of computer, his or her speech profile must be physically transported and installed on each computer (home, office, travel computer) that the user might be using.
The huge vocabulary of potential words that a user might speak also presents a problem. Speech recognition companies attempt to ameliorate this problem by providing language-specific versions of their software tailored to specific categories of users. For example, a speech recognition engine may provide versions based upon “English,” “American English,” “Indian English,” etc., in an attempt to reduce the vocabulary required and to increase accuracy of the engine. Nevertheless, each engine may still require a vocabulary of 50,000 to 100,000 words in order to accurately convert speech to text for any potential user in a given category (in order to match any potential spoken word with a known word in the vocabulary).
Further compounding the problem is that each user of a particular brand of speech recognition software must perform training of that software for it to be accurate. At least two to three hours of training are typically required. Although certain speech engines advertise that no training is required, realistically, at least a minimal amount of training is needed otherwise accuracy suffers. It is not uncommon for a professional user of speech recognition software to spend many hours training that software in order to achieve the highest accuracy. And finally, a user or enterprise must deal with the mechanics of installing and maintaining speech recognition software that can be a great burden. The software must be selected based upon available computers, purchased, installed and maintained. Problems with computer compatibility, lack of memory, etc., are not uncommon. Many versions of installed speech recognition software are out of date (and hence less accurate) because the user or enterprise has not bothered to update the software.
In addition, a user may wish to perform an action, request a service, or retrieve information from a company, web site or other location when all the user has at their disposal is a mobile telephone, voice-enabled computer or other similar voice input device. It can prove difficult for a user to find a person to speak with, or, even if a company has a software application or web site that has the information the user desires or that has the capability to perform a particular service, such software application or web site may be unable to handle the user's request by voice.
Further, various hardware devices such as telephones, cameras, television remote controls, navigation devices, etc. are becoming increasingly more complex to use. A user may know the exact result they wish to achieve with the device but may not know the required instructions, sequence of buttons, controls, etc., to make the device perform the desired function. Currently, it is not feasible for each and every hardware device to incorporate artificial intelligence such that the device can understand a user's speech command and perform the desired function. Yet, a user would like to be able to give a voice command to a device in order to control it.
Another challenge facing any system that handles user speech is the quality of the user speech and dropped connections. Generally, the quality of a live telephone connection (especially with mobile telephones, cordless home telephones, “smart” telephones, a VoIP connection, a SKYPE-type telephone service, etc.) can be poor, compared with traditional, wired analog telephones. Any service that handles user speech arriving over a live telephone connection must deal with lower quality voice data. Also, any time user speech is being recorded over a live telephone connection there is always the possibility of dropouts, static, dead zones, and a dropped connection.
Based upon the above state of technology and the needs of individuals, various systems, services and methods are desired that would address the above needs.