1. Field of the Invention
The present invention relates generally to a speech recognition system in which a speech matching process based on a dynamic programming process hereinafter "DP" is adopted. More particularly, the invention is concerned with a dynamic programming or DP matching system for speech recognition which enjoys an improved or enhanced matching capability while allowing use of a memory of reduced capacity for storing DP data as a result of optimization by DP path pruning techniques.
2. Description of the Prior Art
One of the most fundamental and successful concepts in speech recognition is that of nonlinearly time-aligning an unknown input utterance pattern with reference patterns stored previously. Such approaches, however, result in path finding problems. In the case of continuous speech recognition, there is typically a huge number of possible paths which require a careful organizational plan. For a small search space as in connected word recognition, for example, a dynamic programming (DP) algorithm provides an efficient technique of performing a search in combination with a pruning strategy. However, such a DP approach and application to a large search space can lead to computationally significant overhead and expense because of a huge number of computational paths along which computations have to be performed during matching.
As such, there have been proposed but a few approaches for reducing the number of DP paths. One such approach will be addressed here in some detail to allow a better understanding of the background of the present invention. Reference will be made reference to FIGS. 10 to 13 of the accompanying drawings, in which, FIG. 10 is a functional block diagram which illustrates the concept underlying a dynamic programming or DP matching system for speech recognition known heretofore, FIG. 11 depicts a DP path pruning method which the present invention also concerns. FIG. 12 is a block diagram which depicts a structure of a DP matching system for speech recognition. FIG. 13 is a flow chart for which illustrates DP matching processing executed by the system depicted in FIG. 12.
Referring now to FIG. 10, in the DP matching system for speech recognition, speech feature vectors (also referred to simply as features) are extracted from input utterances in the form of time-serial input speech patterns by a time serial pattern reading module 10 to be stored in a time-serial input speech pattern storage module 11. Subsequently, these time serial input speech patterns are matched with standardized time-serial reference speech patterns stored previously in a reference speech pattern storage module 12 through a dynamic programming or DP matching process. As a result of DP matching each reference speech pattern has a minimum cumulative distance value from the corresponding input speech pattern and is outputted as the result of the matching.
More specifically, let's assume a.sub.i a series of speech features sampled or extracted sequentially at a series of discrete sampling time points, respectively. These speech features a.sub.i are stored in the time-serial input speech pattern memory 11 through the speech pattern extracting module 10, whereon local inter-pattern distances d(i, j) between the speech features a.sub.i represented by the time-serial input speech patterns and reference features b.sub.j of the time serial reference speech patterns mentioned above are determined. On the basis of the local inter-pattern distances d(i, j), cumulative distances G(i, j) are computed in accordance with the following formulas EQU G(1, 1)=d(1, 1), G(i, j)=d(i, j)+min {G(i-1, j), G(i-1, j-1)}(1)
Subsequently, an optimal path is selected from all the possible DP paths on the basis of the cumulative distances G(i, j).
FIG. 11A is a diagram which schematically illustrates paths for computing cumulative distances in which the time axis of the input speech patterns is taken along the abscissa with the time axis of the reference speech patterns being taken along the ordinate. Methods have been proposed for limiting the DP path computation at each succeeding sampling time point i. One of such methods is illustrated in FIG. 11B. According to this method, the DP paths are computed only for the cumulative distances G(i-1, j) and G(i-1, j+3) which are smaller than a given threshold value at a time point (i-1). Consequently, at the succeeding time point i, the cumulative distance G(i, j+2) is not computed as indicated by a cross mark "x" in FIG. 11B. In this manner, overhead involved in the DP path computation can correspondingly be reduced, which in turn, means that the number of cumulative distances to be stored at successive time points can significantly be reduced. As such, the capacity of a cumulative distance memory for storing the cumulative distance data for the DP path computation can correspondingly be reduced.
A structure of a speech pattern matching system implemented on the basis of the concept explained above is shown in FIG. 12 while the processing performed by the system is illustrated in the flow chart depicted in FIG. 13.
Let's assume by G(i-1, k) a k-th cumulative distance value of those remaining as stored in the cumulative distance memory 29 as the candidates for the DP paths at a preceding sampling time point (i-1) while representing by JP(k) a feature identifying number of the corresponding reference speech pattern.
Referring now to FIG. 13 in combination with FIG. 11, at steps (1) and (2), the variables i and k for the input speech pattern and the reference speech pattern are initialized.
In a succeeding step (3), a path along which computation is to be performed is selected.
In a step (4), a local distance d(i, j) is computed by a local distance computing module 22 shown in FIG. 12.
Subsequently, in a step (5), one of temporary or candidate cumulative distances g(i, j) is determined by a temporary cumulative distance calculating module 23 in accordance with the following formula which can be developed from the aforementioned expression (1). Namely, EQU G(i, j)=min {d(i, j)+G(i, j), d(i, j)+G(i, j-1)}
In a step (6), the temporary cumulative distance. g(i, j) thus determined is compared with the other temporary cumulative distance when the latter has already been determined. The smaller distance is selected as the final cumulative distance G(i, j). Unless the other temporary cumulative distance is determined yet, the first mentioned temporary cumulative distance is stored in a buffer 25 as the final cumulative distance G(i, j).
Upon completion of computation of all the possible paths through a loop of steps (7) and (8), the processing proceeds to a next step (9). In step (9), a threshold value .theta.(i) is calculated by a threshold computing module 29, for example, in accordance with EQU .theta.(i)=min {G(i, j)}+.lambda.
where .lambda. represents an allowance value added to the minimum cumulative distance at each of the discrete time points.
In a step (10), a counter reserved in the buffer 25 is initialized.
By executing processing steps (11) to (14), the cumulative distances smaller than the threshold value are selected from those stored in the buffer 25 by a cumulative distance decision module 28 and stored in a cumulative distance memory 29.
Upon completion of the computations of the cumulative distances for all the input speech patterns through the loop including steps (15) and (16), a final result of the matching process is outputted from a result extracting module 31.