Currently, speech recognition software requires that each user have a custom user profile. These user profiles are distributed in the sense that a user must have numerous user profiles if he or she uses different speech recognition software. (For example, while the DRAGON brand software from Nuance Corporation might be used on an IBM-compatible computer, it cannot be used on a computer from Apple Inc., so the user may choose the ILISTEN brand software available from MacSpeech, Inc. for use on an Apple computer.) Further, even if the user always uses a single brand of computer, his or her speech profile must be physically transported and installed on each computer (home, office, travel computer) that the user might be using.
The huge vocabulary of potential words that a user might speak also presents a problem. Speech recognition companies attempt to ameliorate this problem by providing language-specific versions of their software tailored to specific categories of users. For example, a speech recognition engine may provide versions based upon “English,” “American English,” “Indian English,” etc., in an attempt to reduce the vocabulary required and to increase accuracy of the engine. Nevertheless, each engine may still require a vocabulary of 50,000 to 100,000 words in order to accurately convert speech to text for any potential user in a given category (in order to match any potential spoken word with a known word in the vocabulary).
Further compounding the problem is that each user of a particular brand of speech recognition software must perform training of that software for it to be accurate. At least two to three hours of training are typically required. Although certain speech engines advertise that no training is required, realistically, at least a minimal amount of training is needed otherwise accuracy suffers. It is not uncommon for a professional user of speech recognition software to spend many hours training that software in order to achieve the highest accuracy. And finally, a user or enterprise must deal with the mechanics of installing and maintaining speech recognition software that can be a great burden. The software must be selected based upon available computers, purchased, installed and maintained. Problems with computer compatibility, lack of memory, etc., are not uncommon. Many versions of installed speech recognition software are out of date (and hence less accurate) because the user or enterprise has not bothered to update the software.
Finally, once the user has selected a particular brand of speech recognition software, has installed and trained that software, there is no guarantee that the users words will be transcribed accurately. Due to pronunciation, diction, speed of delivery, cadence, voice changes due to illness, etc., the chosen speech recognition software may still produce text that has errors. Even the best software under optimal conditions can find it difficult to achieve a 95% accuracy rate. Based upon the above state of technology and the needs of individuals, a technique and speech-to-text engine is desired that would provide greater accuracy with reduced or no training.