Speech recognition is a field in which significant research and development has taken place. The U.S. Department of Defense began sponsoring studies in the late 1940's, and commercially led advances, by companies such as Bell Laboratories and IBM, followed shortly thereafter. Today, speech recognition tools exist for a wide range of applications, including assistance for the deaf, voice commands for electronic devices such as computers, and for identifying words which comprise voice-based commercial interactions (such as in customer support or telemarketing settings).
Conventional speech recognition techniques function by identifying a single “best” match for a spoken word or phrase. A conventional speech recognition tool receives a spoken word or phrase, converts it to an electronic format, matches its component sounds to a collection of reference data (a “lexicon,” which may include up to tens of thousands of words that the tool has been configured to recognize), identifies a collection of possible matches (“alternatives”) for the spoken word or phrase, and assigns each alternative a probability that it represents the actual word or phrase which was spoken. Any of numerous techniques may be used to identify alternatives for the spoken word or phrase, and/or assign each of the alternatives a corresponding probability of being correct. One prevalent technique is the mathematical modeling method known as the Hidden Markov Model (HMM). Briefly, HMM builds a decision tree with nodes for each of the alternatives it identifies, and based on the characteristics of the combination of words at each node, determines the probability of correctness of each node in relation to the other nodes. Once HMM assigns a probability to each alternative, conventional speech recognition tools select the alternative which has been assigned the highest probability as the correct alternative.
Speech recognition efforts are plagued by significant technical obstacles, brought on by the highly variable nature of speech patterns. In particular, the identification of alternatives and corresponding probabilities for spoken words or phrases is complicated by varying adherence to grammatical correctness, context, accents, and countless other linguistic variables. These challenges have made the precise recognition of a word or phrase, from among tens of thousands of possibilities, a very difficult endeavor for decades. Indeed, speech recognition remains extremely problematic and prone to error.