The present invention generally relates to a noise eliminating apparatus, and more particularly to a noise eliminating apparatus which eliminates components of background noise contained in a speech signal from a speech input to a speech recognition apparatus. Further, the present invention is directed to a speech recognition apparatus using such a noise eliminating apparatus. The present invention is suitably applied to speech recognition in noisy environments such as vehicles, offices, homes, and factories.
In speech recognition, the presence of background noise in input speech deteriorates the rate of speech recognition greatly. Thus, the elimination of background noise in input speech is a serious problem which has to be solved when putting a speech recognition apparatus to practical use. For example, speech recognition techniques are being directed to applications in a running vehicle, such as audio control, navigation system control, and voice dialing control. It is difficult to use a microphone having a high signal-to-noise ratio (S/N ratio) which is attached in the vicinity of the mouth of a speaker, such as a close-talking microphone. For this reason, a variety of background noises, such as engine sounds, sounds resulting from running wheels, or reproduced sounds from radio or stereo sets, are added to a speech which is input through the microphone. The presence of background noise deteriorates the ability to recognize input speech. Out of the above-mentioned noises, sounds from engines and wheels depend on the vehicle speed and vary greatly depending on the environment. Likewise, reproduced sounds from radio or stereo sets change greatly depending on the frequency and amplitude level thereof. From this point of view, there is a need to provide noise eliminating techniques which are independent of the magnitude of background noise and which are effectively capable of eliminating noise components even when the frequency range of noise changes.
There is known a spectral subtraction method which is generally used for eliminating noise components in a speech input to a speech recognition apparatus, and which uses as features of speech, time-spectral patterns (see "SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION", S.F. Boll, IEEE Trans. ASSP-27, No.2, pp. 113-120, 1979). The proposed subtraction method includes the steps of averaging an input over a section where there is no voice, holding an averaged input as a noise spectrum and subtracting the noise spectrum from a spectrum of an input speech containing noise components. The subtraction result is output as a finalized speech spectrum. It should be noted that the proposed method is based on an assumption that background noise is stationary on the time base, such as white noise or hoth noise. Thus, the proposed method is effective regarding stationary noise, but less effective regarding dynamic noise. Particularly, when stationary noise has large-level components, the proposed method cannot eliminate such noise components effectively.
There is also known an adaptive noise cancelling method which uses a primary input and a reference input (see ADAPTIVE NOSE CANCELLING: PRINCIPLES AND APPLICATION", B. Widraw et al., Proc. IEEE, Vol. 63, No. 12, pp. 1692-1716, 1975). Further, there is known a nose cancelling apparatus which is related to the above-identified adaptive noise cancelling method (see Japanese Laid-Open Patent Application No. 1-239596 published on Sept. 25, 1989, which corresponds to U.S. Pat. Application S. N. 167,619 filed on Mar. 14, 1988). An adaptive filter disclosed in the above Japanese application corresponds to an improvement of the adaptive noise cancelling method disclosed in the document by Widraw et al, in which a coefficient directed to compensating the difference in amplitude and phase between the two inputs is provided for each of the plurality of frequency ranges so that noise components arising from a plurality of noise sources can be suppressed. However, adaptive noise cancelling methods using two inputs have a disadvantage in that it is difficult to suppress noise effectively due to the fact that the values of the coefficients which are determined when noise is small have large errors. For this reason, a spectral subtraction method using a single input is suitable for dynamic noise having a small level, rather than the spectral subtraction method using two inputs.
One may consider, from the above-mentioned advantages and disadvantages, that a certain threshold level is provided and noise processing is switched between the spectral subtraction method using single input and the spectral subtraction method using two inputs by comparing the level of background noise with the threshold level. However, this has the following shortcomings. First, when the level of background noise is close to the threshold level, the disadvantages of the above two methods appear. Second, it is very difficult to equally handle the noise-eliminated speech patterns which are derived from the two methods. Due to the above-mentioned first and second reasons, the recognition rate is greatly low in the vicinity of noise levels. Third, when noise levels are close to the threshold level, it is necessary to carry out the two methods and this method needs an increased amount of data to be processed.
Moreover, generally, the setting of various coefficients for eliminating noise elements in not only conventional spectral subtraction methods but also adaptive noise cancelling methods is carried out in a section other than a speech section. That is, the procedure for renewing such coefficients is not performed in a speech section. If dynamic noise (level or frequency thereof or position of noise source) changes during the speech section, it is impossible to select appropriate values of coefficients and thus eliminate noise components.