In pattern recognition, pattern data is usually quantified by converting the data into feature vectors. For instance, in speech recognition, speech data is converted to feature vectors that commonly have 39 or 40 elements. The feature vectors are subsequently used to analyze the pattern data to determine patterns in the data. Generally, during training, a number of classes are developed. During real-time processing, new feature vectors, created from pattern data, can be assigned to particular classes and processed accordingly.
Researchers are constantly trying to decrease the error rate of pattern recognition. One way to do this is to modify the number and type of features in a feature vector. Some of these changes have improved pattern recognition, and others have not.
Unfortunately, the only way to currently determine if adding features to or changing features in a feature vector affects the error rate of a pattern recognition system is to use unmodified feature vectors, determine the error rate, modify the feature vectors, and determine the new error rate. If the new error rate is better than the original error rate, the additional or changed features have helped pattern recognition. This can be a time consuming and laborious process.
A problem associated more specifically with cepstral features in speech applications is that these features can get corrupted by wide-band noise. Thus, the noise immunity of cepstral features, in feature vectors, are less than ideal.
For speech applications, one feature that speech systems can analyze is a “formant” feature. Voiced sounds have a particular formant structure when viewed in the frequency domain. This formant structure is basically a spectral envelope that overlies an underlying speech amplitude curve, and it usually has three “humps” that decrease with increasing frequency. Conversely, unvoiced sounds have a fairly random structure when viewed in the frequency domain. Some speech processing systems try to determine representative formant features, which can include determining multiple peaks of the formant structure. Multiple peak selection can be fairly complex. Most speech processing systems also try to determine formant features even in unvoiced speech, which do not have formant structures. This can make the formant features very noisy for these unvoiced speech regions.
Consequently, what is needed is a better way of overcoming the problems of non-ideal pattern recognition when using feature vectors, lengthy and complex determination of whether new or different features improve pattern recognition, noise resistance of feature vectors, multiple peak selection for formant structures, and noisy formant features for unvoiced speech regions.