Pattern recognition systems are used, for example, for the recognition of characters and speech patterns.
Pattern recognition systems are known which are based on matching the pattern being tested against a reference database of pattern templates. The spectral distance between the test pattern and the database of reference patterns is measured and the reference pattern having the closest spectral distance to the test pattern is chosen as the recognized pattern.
An example of the prior art pattern recognition system using a distance measure calculation is shown in FIGS. 1, 2 and 3, to which reference is now made. FIG. 1 is a flow chart illustrating the prior art pattern recognition system for speech patterns using a conventional linear predictor coefficient (LPC) determiner and a distance calculator via dynamic time warping (DTW). FIG. 2 illustrates the relationship between two speech patterns A and B, along i-axis and j-axis, respectively. FIG. 3 illustrates the relationship between two successive points of pattern matching between speech patterns A and B.
Referring to FIG. 1, the audio signal 10 being analyzed, has within it a plurality of speech patterns. Audio signal 10 is digitized by an analog/digital converter 12 and the endpoints of each speech pattern are detected by a detector 14. The digital signal of each speech pattern is broken into frames and for each frame, analyzer 16 computes the linear predictor coefficients (LPC) and converts them to cepstrum coefficients, which are the feature vectors of the test pattern. Reference patterns, which have been prepared as templates, are stored in a database 18. A spectral distance calculator 20 uses a dynamic time warping (DTW) method to compare the test pattern to each of the reference patterns stored in database 18. The DTW method measures the local spectral distance between the test pattern and the reference pattern, using a suitable method of measuring spectral distance, such as the Euclidean distance between the cepstral coefficients or the weighted cepstral distance measure. The template whose reference pattern is closest in distance to the analyzed speech pattern, is then selected as being the recognized speech pattern.
In a paper, entitled "Dynamic Programming Algorithm Optimization for Spoken Word Recognition", published by the IEEE Transactions on Acoustics, Speech and Signal Processing in February 1978, Sakoe and Chiba reported on a dynamic programming (DP) based algorithm for recognizing spoken words. DP techniques are known to be an efficient way of matching speech patterns. Sakoe and Chiba introduced the technique known as "slope constraint", wherein the warping function slope is restricted so as to discriminate between words in different categories.
Numerous spectral distance measures have been proposed including the Euclidean distance between cepstral coefficients which is widely used with LPC-derived cepstral coefficients. Furui in a paper, entitled "Cepstral Analysis Techniques for Automatic Speaker Verification", published by the IEEE Transactions on Acoustics, Speech and Signal Processing in April 1981, proposed a weighted cepstral distance measure which further reduces the percentage of errors in recognition.
In a paper, entitled "A Weighted Cepstral Distance Measure for Speech Recognition", published by the IEEE Transactions on Acoustics, Speech and Signal Processing in October 1987, Tahkura proposed an improved weighted cepstral distance measure as a means to improve the speech recognition rate.
Referring now to FIG. 2, the operation of the DTW method will be explained. In FIG. 2, speech patterns A and B are shown along the i-axis and j-axis, respectively. Speech patterns A and B are expressed as a sequence of feature vectors a.sub.1, a.sub.2, a.sub.3 . . . a.sub.m and b.sub.1, b.sub.2, b.sub.3 . . . b.sub.m, respectively.
The timing differences between two speech patterns A and B, can be depicted by a series of `points` Ck(i,j). A `point` refers to the intersection of a frame i from pattern A to a frame j of pattern B. The sequence of points C1, C2, C3 . . . Ck represent a warping function 30 which effects a map from the time axis of pattern A, having a length m, on to the time axis of pattern B, having a length n. In the example of FIG. 2, function 30 is represented by points c1(1,1), c2(1,2), c3(2,2), c4(3,3), c5(4,3) . . . ck(n,m). Where timing differences do not exist between speech patterns A and B, function 30 coincides with the 45 degree diagonal line (j=i). The greater the timing differences, the further function 30 deviates from the 45 degree diagonal line.
Since function 30 is a model of time axis fluctuations in a speech pattern, it must abide by certain physical conditions. Function 30 can only advance forward and cannot move backwards and the patterns must advance together. These restrictions can be expressed by the following relationships: EQU i(k)-i(k-1).ltoreq.1 and (j(k)-j(k-1).ltoreq.1; and i(k-1).ltoreq.i(k) and j(k-1).ltoreq.j(k). (1)
Warping function 30 moves one step at a time from one of three possible directions. For example, to move from C3(2,2) to C4(3,3), function 30 can either move directly in one step from (2,2) to (3,3) or indirectly via the points at (2,3) or (3,2).
Function 30 is further restricted to remain within a swath 32 having a width r. The outer borders 34 and 36 of swath 32 are defined by (j=i+r) and (j=i-r), respectively.
A fourth boundary condition is defined by: EQU i(1)=1, j(1)=1, and i(end)=m, j(end)=n. (2)
Referring now to FIG. 3, where, for example, the relationship between successive points C10(.sub.10,10) and C11(.sub.11,11), of pattern matching between speech patterns A and B is illustrated. In accordance with the conditions as described hereinbefore, there are three possible ways to arrive at point C11(.sub.11,11), that is, either directly from C10(.sub.10,10) to C11(.sub.11,11), indicated by line 38 or from C10(.sub.10,10) via point (.sub.11,10) to C11(.sub.11,11), indicated by lines 40 and 42, or thirdly from C10(.sub.10,10) via point (.sub.10,11) to C11(.sub.11,11), indicated by lines 44 and 46.
Furthermore, associated with each arrival point (i,j), such as point C11(.sub.11,11), is a weight W.sub.ij, such as the Euclidean or Cepstral distance between the ith frame of pattern A and the jth frame of pattern B. By applying a weight W.sub.ij to each of indirect paths 40, 42, 44 and 46 and a weight of 2W.sub.ij to direct path 38, the path value S.sub.ij, at the point (ij) can be recursively ascertained from the equation: ##EQU1##
In order to arrive at endpoint S.sub.nm, it is necessary to calculate the best path value S.sub.ij at each point. Row by row is scanned and the values of S.sub.ij for the complete previous row plus the values of the present row up to the present point are stored. The value for Snm is the best path value.