The present invention relates to a method of and apparatus for pattern matching. The invention has particular, although not exclusive relevance to the adjustment of a pruning threshold used in a dynamic programming pattern matching technique. In an exemplary embodiment, the dynamic programming matching technique is employed in a speech recognition system.
Speech recognition is a process by which an unknown speech utterance is identified. There are several different types of speech recognition systems currently available which can be categorised in several ways. For example, some systems are speaker dependent, whereas others are speaker independent. Some systems operate for a large vocabulary of words ( greater than 10,000 words) while others only operate with a limited sized vocabulary ( less than 1000 words). Some systems can only recognise isolated words whereas others can recognise phrases comprising a series of connected words.
In a limited vocabulary system, speech recognition is performed by comparing features of an unknown utterance with features of known words which are stored in a database. The features of the known words are determined during a training session in which one or more samples of the known words are used to generate reference patterns therefor.
To recognise the unknown utterance, the speech recognition apparatus extracts a pattern (or features) from the utterance and compares it against each reference pattern stored in a database. One way of comparing the pattern representative of the input utterance with the reference patterns is to use a dynamic programming matching technique, which provides an optimal time alignment between each of the reference patterns and the pattern extracted from the unknown utterance. This is achieved by locally shrinking or expanding the time axis of one pattern until there is an optimal match between the pairs of patterns. The reference pattern or sequence of reference patterns providing the best match identifies the word or words most likely to correspond to the input utterance.
One problem with the dynamic programming matching technique is that it is computationally expensive, since it involves the determination of many possible matchings between the incoming utterance and each reference model.
During the matching process, each possible matching is given a score which is dependent upon the closeness of the match. One method used to limit the amount of computations involved in the dynamic programming matching technique is to stop the processing of badly scoring matchings. In the art of speech recognition, this technique is known as pruning. However, a problem with using the pruning technique is that the number of possible matchings varies considerably and if there is only a fixed amount of memory available, then memory overflow may arise.
EP-A-0525640 (Fujitsu Limited) solves this problem by varying the threshold to ensure that the number of possible matchings processed at each time point lies between a given minimum and maximum number. In particular, the pruning threshold is varied in dependence upon a predicted number of possible matchings that will have to be processed at the next time point. The predicted number is derived from a linear extrapolation of the number of possible matchings which were processed at a current time point and the number of possible matchings which were processed at a proceeding time point. The process employed in EP-A-0525640 ensures that the actual number of possible matchings at each time point lies between the given minimum and maximum number by counting the possible matchings for a given threshold and adjusting the threshold until the condition is satisfied.
EP-A-0789348 discloses a similar system for adjusting the pruning threshold, except rather than estimating the number of possible matchings that will be processed at the next time point, the system disclosed uses the dynamic programming constraints to propagate the paths which end at the current time point to the next time point and counts the number of dynamic programming paths which have been propagated to the next time point and which have not been discarded.
The present invention aims to provide a more efficient pruning technique which is effective to reduce the number of possible matchings which are propagated, whilst maintaining the accuracy of the matching process.
According to one aspect, the present invention provides a method of matching a first sequence of patterns representative of a first signal with a second sequence of patterns representative of a second signal, the method comprising the steps of:
matching the first signal with the second signal using a matching processes which processes each first signal pattern in sequence and which propagates a plurality of paths, each path representing a possible matching between a sequence of second signal patterns and a sequence of first signal patterns, and each path having an associated cumulative value representative of the closeness of the match; and
controlling the matching step by comparing said cumulative values with a pruning value during the processing of each first signal pattern and discarding paths in dependence upon the result of said comparing step;
wherein a number of different pruning values are used in said controlling step during the processing of a current first signal pattern, and wherein the pruning value used for a given path during the processing of the current first signal pattern depends upon the position, within the sequence of patterns representing said second signal, of the second signal pattern which is at the end of the given path for the current first signal pattern being processed.
By using different pruning thresholds in this way, the number of paths propagating at each time point can be reduced, whilst maintaining the accuracy of the system. In particular, this is because it has been observed that the difference between the best path and the local minimum tends to be the greatest when the best path is traversing the first few patterns of the second signal.
Preferably, a soft pruning technique is performed, whereby some paths which have a cumulative value which is worse than the corresponding pruning value are not discarded. This has the benefit that where the best path is pruned out, neighbouring paths which will have scores similar to the best path will be kept and not pruned. Therefore, even if the best path is pruned out, paths sufficiently close to the best path will be retained so that the pruning does not result in recognition errors.