1. Field of the Invention
The present invention relates to a speech recognition apparatus and method which can recognize speech in a noisy environment such as in a car.
2. Description of the Related Art
In recent years, speech recognition interfaces and speech recognition apparatuses are going to become popular. A typical speech recognition apparatus recognizes a speech input through an input device, such as a microphone, by comparing the input speech with models of speech, words or phonemes (Hereinafter, they are referred to as "speech models".) stored in advance in a memory device.
In a car which is running, there are a variety of noises, such as a frictional noise between tires and a road surface, a noise of wind, a sound of music from a car stereo and so on. If the speech recognition is performed in such a noisy environment, not only human voice but also noises are input through a microphone, so that accuracy of the speech recognition goes down.
If the distance between the microphone and the mouth of the car driver is very short, it is possible to increase accuracy of the speech recognition in the car. However, it is undesirable for safety that the car driver moves his or her head near the microphone during driving. If a headset type microphone is used, the microphone can be always positioned near the mouth of the car driver. However, this gives the car driver an unpleasant feeling.
On the other hand, if the speech models are modified so as to be adapted to the noisy environment, it is also possible to increase accuracy of the speech recognition in the car. In a paper "Recognition of noisy speech by composition of hidden Markov models" by F. Martin, K. Shikano, Y. Minami and Y. Okabe published in Technical Report of IEICE (the Institute of Electronics, Information and Communication Engineers), SP92-96, pp. 9-16, December 1992, a method of combining a hidden Markov model (Hereinafter, it is referred as an "HMM".) of clean speech and an HMM of a noise is described. According to this paper, an HMM of a combination of a clean speech and noise can be generated by a following equation (1). EQU X=.GAMMA..sup.-1 log(e.sup.r.spsp.S +k.multidot.e.sup.r.spsp.N)(1)
In this equation (1), the "X" represents a feature parameter of the HMM of the combination of the clean speech and noise, the "S" represents a feature parameter of the HMM of the clean speech, the "N" represents a feature parameter of the HMM of the noise, the "k" represents a value corresponding to the signal-to-noise ratio, the ".GAMMA." represents a Fourier transform.
The feature parameter X, S and N are linear predictive coding (LPC) cepstrum parameters. Therefore, in order to combine the feature parameter S and the feature parameter N, a number of complex calculations, such as a Fourier transform, an exponential transform, a logarithm transform and an Inverse Fourier transform, are required, as shown in FIG. 1. Carrying out such complex calculations with a computer takes a long time.
If the speech recognition using the equation (1) is adapted to a car navigation apparatus, the response speed of the apparatus is very slow.
In a car navigation apparatus, a speech recognition is used as a man-machine interface for inputting information, for example, a destination. A word representing a destination, namely, a place name is relatively short, and consists of a small number of phonemes. In the case of the speech recognition for a car navigation apparatus, the total vocabulary is relatively small. Therefore, the speech recognition using rigorous processes, like the equation (1), may not be required. In order to increase the response speed of the car navigation apparatus, it is necessary to make the speech recognition processes simpler.