1. Technical Field
The present invention relates in general to improved speech recognition systems and in particular to an improved method and system for enhanced speech recognition accuracy. Still more particularly, the present invention relates to a method and system for enhanced speech recognition in a mobile system utilizing location-specific libraries of speech templates and an identification of the system location.
2. Description of the Related Art
Speech recognition is well known in the prior art. The recognition of isolated words from a given vocabulary for a known speaker is perhaps the simplest type of speech recognition and this type of speech recognition has been known for some time. Words-within the vocabulary to be recognized are typically prestored as individual templates, each template representing the sound pattern for a word in the vocabulary. When an isolated word is spoken, the system merely compares the word to each individual template which represents the vocabulary. This technique is commonly referred to as whole-word template matching. Many successful speech recognition systems use this technique with dynamic programming to cope with nonlinear time scale variations between the spoken word and the prestored template.
Of greater difficulty is the recognition of continuous speech or speech which contains proper names or place names. Continuous speech, or connected words, have been recognized in the prior art utilizing multiple path dynamic programming. One example of such a system is proposed in "Two Level DP Matching A Dynamic Programming Based Pattern Matching Algorithm For Connected Word Recognition" H. Sakoe, IEEE Transactions on Acoustics Speech and Signal Processing, Volume ASSP-27, No. 6, pages 588-595, December 1979. This paper suggests a two-pass dynamic programming algorithm to find a sequence of word templates which best matches the whole input pattern. Each pass through the system generates a score which indicates the similarity between every template matched against every possible portion of the input pattern. In a second pass the score is then utilized to find the best sequence of templates corresponding to the whole input pattern.
U.S. Pat. No. 5,040,127 proposes a continuous speech recognition system which processes continuous speech by comparing input frames against prestored templates which represent speech and then creating links between records in a linked network for each template under consideration as a potentially recognized individual word. The linked records include ancestor and descendent link records which are stored as indexed data sets with each data set including a symbol representing a template, a sequence indicator representing the relative time the link record was stored and a pointer indicating a link record in the network from which it descends.
The recognition of proper names represents an increase in so-called "perplexity" for speech recognition systems and this difficulty has been recently recognized in U.S. Pat. No. 5,212,730. This patent performs name recognition utilizing text-derived recognition models for recognizing the spoken rendition of proper names which are susceptible to multiple pronunciations. A name recognition technique set forth within this patent involves entering the name-text into a text database which is accessed by designating the name-text and thereafter constructing a selected number of text-derived recognition models from the name-text wherein each text-derived recognition model represents at least one pronunciation of the name. Thereafter, for each attempted access to the text database by a spoken name input the text database is compared with the spoken name input to determine if a match may be accomplished.
U.S. Pat. No. 5,202,952 discloses a large-vocabulary continuous-speech prefiltering and processing system which recognizes speech by converting the utterances to frame data sets wherein each frame data set is smoothed to generate a smooth frame model over a predetermined number of frames. Clusters of word models which are acoustically similar over a succession of frame periods are designated as a resident vocabulary and a cluster score is then generated by the system which includes the likelihood of the smooth frames evaluated utilizing a probability model for the cluster against which the smoothed frame model is being compared.
Each of these systems recognizes that successful speech recognition requires a reduction in the perplexity of a continuous-speech utterance. Publications which address this problem are "Perplexity-A Measure of Difficulty of Speech Recognition Tasks," Journal of the Acoustical Society of America, Volume 62, Supplement No. 1, page S-63, Fall 1977, and the "Continuous Speech Recognition Statistical Methods" in the Handbook of Statistics Volume 2: Classification, Pattern Recognition and Reduction of Dimensionality, pages 549-573, North-Holland Publishing Company, 1982.
In view of the above, it should be apparent that successful speech recognition requires an enhanced ability to distinguish between large numbers of like sounding words, a problem which is particularly difficult with proper names, place names and numbers. It should therefore be apparent that a need exists for a method and system which enhances the accuracy and efficiency of speech recognition.