Our invention is related to speech analysis and more particularly to arrangements for automatically recognizing a speech pattern.
In processing, control and communication systems, it is often advantageous to use speech as input for information, data and commands. Speech input signals may be utilized to automatically record transaction data in data processing equipment or request information from data processing equipment over telephone or other types of voice connections. Speech recognition facilities permit an operator to interact with data processing or control equipment by voice without interruption of other activities. The successful use of voice signals in such applications requires that utterances be recognized as particular words or phrases. Accurate recognition, however, is difficult because of the complexity of speech patterns and their variability from speaker to speaker and even for a particular speaker.
In many known speech recognition systems, an input speech pattern is analyzed to provide a set of features characteristic thereof. Such feature signals may be derived through a spectral, linear prediction, or other analysis of successive time intervals of the speech pattern. Initially, the recognition apparatus is trained by generating feature signal templates for utterances of identified reference patterns. Subsequent to the storage of the reference pattern feature signal templates, an unknown utterance is analyzed and the sequence of feature signals for the unknown utterance is compared to the template sequence of the reference patterns. After the comparisons are completed, the unknown utterance is identified as the reference pattern whose feature signals most closely correspond to the feature signals of the unknown utterance.
The comparison of the feature sequence of an utterance to the feature sequence of a reference template requires time alignment of the feature sequences to account for differences in speech rate and articulation and measurement of the similarity of corresponding features. The log likelihood ratio for linear prediction coefficient (LPC) features of linear prediction analysis disclosed in the article, "Minimum Prediction Residual Principle Applied to Speech Recognition" by F. Itakura, IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. ASSP-23, No. 1, pp. 67-72, February 1975, is of the form: EQU d(U,R)=log[(a.sub.R V.sub.U a'.sub.R)/(a.sub.U V.sub.U a'.sub.U)](1)
and provides a high degree of recognition accuracy with relatively little processing. a.sub.R is a vector of the (p+1) linear predictive coefficient of a p.sup.th order LPC model of the reference, a.sub.U is a similar vector for the utterance, and V.sub.U is the (p+1.times.p+1) autocorrelation matrix of the utterance pattern frame. Time alignment optimization is generally accomplished as is well known in the art by dynamic programming. As applied to speech recognition, dynamic programming is used to determine the total distance between an utterance feature sequence, e.g., EQU U=[U(1), U(2), . . . , U(n), . . . U(N)] (2)
and a feature sequence for the k.sup.th reference, e.g., EQU R.sub.k =[R.sub.k (1), R.sub.k (2), . . . , R.sub.k (m), . . . R.sub.k (M.sub.k)] (3)
over all acoustically possible paths EQU m=w(n) (4)
in accordance with ##EQU1##
While recognition systems using spectral or linear prediction analysis are well suited to identifying speech patterns, errors occur if the speech patterns to be recognized exhibit anomalies that are not identified in the spectral or prediction parameters. For example, anomalies such as partially voiced words or lip smacks may cause a speech pattern to match incorrect reference words having similar spectral patterns but differing in other respects. It is an object of the invention to provide improved speech recognition in the presence of anomalies in the pattern to be recognized.