1. Field of the Invention
The present invention relates to pattern recognition, where a set of feature vectors is formed from digitized incoming signals, and compared with templates of candidate patterns.
2. Related Art
In pattern recognition, incoming signals are digitized, and a sequence of feature vectors are formed. These feature vectors are then compared to templates of the candidate patterns, e.g., sounds or images to be identified in the signal. In the case of speech recognition, the candidate patterns can represent e.g., names in a phonebook.
However, pattern recognition such as speech recognition is computationally demanding. In many cases, for example when implemented in embedded devices, due to the limited amount of memory and computational power there is a need to reduce the complexity of the pattern recognition algorithm.
The computational complexity depends on several factors: the sampling rate, the number of candidate model templates, and the feature vector dimension. Reducing any of these results in faster recognition that can be run in reasonable time on a certain processor, but this can result in poorer recognition accuracy.
Furthermore, available resources are usually shared between different processes, and the available processing power and memory capacity is therefore variable. If the recognition functionality of an embedded device, having limited processing capacity to begin with, is to work at all times, it is even more crucial to minimize or dynamically adjust the processing requirements, without losing recognition accuracy.
Conventional complexity reduction of pattern recognizers has been addressed by at least the following prior art techniques:
1. Feature Vector Down-sampling
A technique that reduces the decoding complexity by using the state likelihood (SL) measure corresponding to an incoming feature vector in several consecutive frames (time instants).
2. Clustering of the Model Templates
This technique clusters the acoustic space off-line. During decoding, a quick search among the clusters is performed first, and then only the SL measures for the members of the best matching cluster are evaluated.
3. Lowering the Feature Vector Dimension
The number of feature vector components are reduced to a predefined number, using advanced linear transforms, such as PCA, LDA, etc, or neural networks.
Focusing on the third category, conventional examples of this technique do not have the flexibility to scale the computational complexity according to the available CPU power. Instead, it is always considered with the worst-case scenario. In addition, spectro-temporal linear transforms or neural network-based mappings may significantly increase the complexity of the front-end, and thus the whole recognizer.
An example of feature vector dimension reduction is given in “Should recognizers have ears”, Speech Communication, Vol. 25, pp. 3-27, 1998.