1. Field of the Invention
The present invention relates to a feature extraction apparatus and method and a pattern recognition apparatus and method. In particular, the invention relates to a feature extraction apparatus and method and a pattern recognition apparatus and method which are suitable for use in a case where speech recognition is performed in a noise environment.
2. Description of the Related Art
FIG. 1 shows an example configuration of a conventional pattern recognition apparatus.
An observation vector as a pattern recognition object is input to a feature extraction section 101. The feature extraction section 101 determines, based on the observation vector, a feature vector that represents its feature quantity. The feature vector thus determined is supplied to a discrimination section 102. Based on the feature vector supplied from the feature extraction section 101, the discrimination section 102 judges which of a predetermined number of classes the input observation vector belongs to.
For example, where the pattern recognition apparatus of FIG. 1 is a speech recognition apparatus, speech data of each time unit (hereinafter referred to as a frame where appropriate) is input to the feature extraction section 101 as an observation vector. The feature extraction section 101 acoustically analyzes the speech data as the observation vector, and thereby extracts a feature vector as a feature quantity of speech such as a power spectrum, cepstrum coefficients, or linear prediction coefficients. The feature vector is supplied to the discrimination section 102. The discrimination section 102 classifies the feature vector as one of a predetermined number of classes. A classification result is output as a recognition result of the speech data (observation vector).
Among known methods for judging which one of a predetermined number of classes a feature vector belongs to in the discrimination section 102 are a method using a Mahalanobis discriminant function, a mixed normal distribution function, or a polynomial function, a method using an HMM method, and a method using a neural network.
For example, the details of the above speech recognition techniques are disclosed in “Fundamentals of Speech Recognition (I) and (II),” co-authored by L. Rabiner and B-H. Juang, translation supervisedby Furui, NTT Advanced Technology Corp., 1995. As for the general pattern recognition, detailed descriptions are made in, for example, R. Duda and P. Hart, “Pattern Classification and Scene Analysis,” John Wiley & Sons, 1973.
Incidentally, when pattern recognition is performed, an observation vector (input pattern) as a pattern recognition object generally includes noise. For example, a voice as an observation vector that is input when speech recognition is performed includes noise of an environment of a user's speech (e.g., voices of other persons or noise of a car). To give another example, an image as an observation vector that is input when image recognition is performed includes noise of a photographing environment of the image (e.g., noise relating to weather conditions such as mist or rain, or noise due to lens aberrations of a camera for photographing the image).
Spectral subtraction is known as one of feature quantity (feature vector) extraction methods that are used in a case of recognizing voices in a noise environment.
In the spectral subtraction, an input before occurrence of a voice (i.e., an input before a speech section) is employed as noise and an average spectrum of the noise is calculated. Upon subsequent input of a voice, the noise average spectrum is subtracted from the voice and a feature vector is calculated by using a remaining component as a true voice component.
For example, the details of the spectral subtraction are disclosed in S. F. Boll, “Suppression of Acoustic Noise in Speech Using Spectral Subtraction,” IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-27, No. 2, 1979; and P. Lockwood and J. Boudy, “Experiments with a Nonlinear Spectral Subtractor, Hidden Markov Models and the Projection, for Robust Speech Recognition in Cars,” Speech Communication, Vol. 11, 1992.
Incidentally, it can be considered that the feature extraction section 101 of the pattern recognition apparatus of FIG. 1 executes a process that an observation vector a representing a certain point in the observation vector space is mapped to (converted into) a feature vector y representing a corresponding point in the feature vector space as shown in FIG. 2.
Therefore, the feature vector y represents a certain point (corresponding to the observation vector a) in the feature vector space. In FIG. 2, each of the observation vector space and the feature vector space is drawn as a three-dimensional space.
In the spectral subtraction, an average noise component spectrum is subtracted from the observation vector a and then the feature vector y is calculated. However, since the feature vector y represents one point in the feature vector space as described above, the feature vector y does not reflect characteristics representing irregularity of the noise such as variance though it reflects the average characteristics of the noise.
Therefore, the feature vector y does not sufficiently reflect the features of the observation vector a, and hence it is difficult to obtain a high recognition rate with such a feature vector y.
The present invention has been made in view of the above circumstances, and an object of the invention is therefore to increase the recognition rate.