Speech recognition is a process by which an unknown speech utterance is identified. There are several different types of speech recognition systems currently available which can be categorised in several ways. For example, some systems are speaker dependent, whereas others are speaker independent. Some systems operate for a large vocabulary of words (&gt;10,000 words) while others only operate with a limited sized vocabulary (&lt;1000 words). Some systems can only recognise isolated words whereas others can recognise phrases comprising a series of connected words.
In a limited vocabulary system, speech recognition is performed by comparing features of an unknown utterance with features of known words which are stored in a database. The features of the known words are determined during a training session in which one or more samples of the known words are used to generate reference patterns therefor. The reference patterns may be acoustic templates of the modelled speech or statistical models, such as Hidden Markov Models.
To recognise the unknown utterance, the speech recognition apparatus extracts a pattern (or features) from the utterance and compares it against each reference pattern stored in the database. A scoring technique is used to provide a measure of how well each reference pattern, or each combination of reference patterns, matches the pattern extracted from the input utterance. The unknown utterance is then recognised as the word(s) associated with the reference pattern(s) which most closely match the unknown utterance.
Typically, the scoring is accomplished using a dynamic programming technique which provides an optimal time alignment between each of the reference patterns and the pattern extracted from the unknown utterance, by locally shrinking or expanding the time axis of one pattern until there is an optimal match between the pairs of patterns. The reference pattern or sequence of reference patterns having the best score identifies the word or words most likely to correspond to the input utterance.
The dynamic programming matching technique is relatively computationally and memory expensive as it involves the determination of many possible matchings between the incoming utterance and each reference model.
U.S. Pat. No. 4,592,086 (Nippon Electric Co. Limited) discloses a connected digit speech recognition system which uses a dynamic programming matching technique. U.S. Pat. No. 4,592,086 discloses that the amount of memory required for the matching process can be reduced if the patterns of a reference model, which are at an end of a dynamic programming path, are processed in reverse sequential order.
EP 0789348A1, subsequently issued as U.S. Pat. No. 5,907,824 discloses a system for matching a first sequence of patterns representative of a first signal with a second sequence of patterns representative of a second signal, wherein the system processes each first signal pattern in-turn by:
defining as active patterns the second signal patterns which are at the end of a path of a current first signal pattern being processed, each path representing a possible matching between an ordered sequence of second signal patterns and an ordered sequence of first signal patterns ending at the current first signal pattern;
for each active pattern, storing a cumulative value which is indicative of the closeness of the match for the path which ends at that active pattern for the current first signal pattern; and
updating said cumulative values and propagating said paths based on constraints which are placed on the path propagation, by processing each active pattern in reverse sequential order;
wherein during the propagation of the path ending at the current active pattern being processed, there is an overlap with paths which have been propagated during the processing of previous active patterns, a comparison is made of cumulative values associated with the paths in the overlap region, in order that the path with the best score is propagated whilst the other paths are terminated.
However, the system described in EP 0789348A1 or U.S. Pat. No. 5,907,824 is relatively slow since it performs a check on each second signal pattern to which the path associated with the current active pattern being processed to see if it falls within the overlap region.