1. Field of the Invention
The present invention relates generally to the field of speech recognition and specifically to a speech recognition system which is optimized for distinguishing between similar utterances.
2. Description of the Prior Art
Most speech recognition systems operate, at least at a high level of abstraction, in substantially the same manner. Spoken words are converted to electrical signals which are then analyzed to generate a sequence of tokens representing specific sounds. These tokens are then analyzed to determine which word or words correspond to the sequence of tokens. The words so determined are provided as the output of the speech recognition system.
Due to variations in the pronunciation of individual words by a single speaker in different contexts and the even larger variations in pronunciation among several speakers, the actual methods used for speech recognition are highly probabilistic. Consequently, a given word spoken in a slightly different manner by one speaker, or by different speakers, can be correctly identified by the speech recognition system as long as it stays within the probabilistic model of the given word and, as long as there is no significant overlap between the model of the given word and a model representing a similar but different word.
Many speech recognition systems recognize that the overlap of similar word models is inevitable and use a model of the spoken language to distinguish between similar words based on the context in which the word appears. These models may, for example, check a sequence of spoken words for grammatical correctness or for likelihood of occurrence based on a relatively large text sample.
In spite of these techniques, some words which sound alike may still be difficult to identify if there is no context base which may be used to determine their likelihood of occurrence, or if they occur interchangeably with similar words in substantially identical contexts. Exemplary words of this type of are the names of the letters B, C, E, G, P, T, V and Z. These words are of particular importance because many speech recognition systems include a mode which allows the speaker to spell a word which is not likely to be in the dictionary of words recognized by the system. In this mode, the language model may not be helpful, even if it were designed to include English language spelling conventions, if the spelled word is a foreign word which does not follow English language spelling conventions. Moreover, even if the spelled word were an English word, it would be difficult for the language model to encompass the relatively large number of English language spelling rules.
U.S. Pat. No. 4,759,068 to Bahl et al. relates a method by which a string of tokens derived from spoken words are analyzed to derive a sequence of individual fenemes which most closely corresponds to the spoken words. This patent discloses the structure of a typical speech recognition system in detail.
U.S. Pat. No. 4,559,604 to Ichikawa et al. relates to a pattern recognition system in which an input pattern is compared against a set of standard patterns to define a set of patterns that are more likely than any other patterns to be a match for the input pattern. A specific one of these selected patterns is inferred as the most likely based on one of four preferred criteria of inference.
U.S. Pat. No. 4,363,102 to Holmgren et al. relates to a speaker identification system which develops a plurality of templates corresponding to known words spoken by a corresponding plurality of speakers. An individual speaker is identified as having the smallest probabilistic distance between his spoken words and the templates corresponding to one of the known speakers.