1. Field of the Invention
The present invention relates to a noise power estimation system, a noise power estimating method, a speech recognition system and a speech recognizing method.
2. Background Art
In order to achieve natural human robot interaction, a robot should recognize human speeches even if there are some noises and reverberations. In order to avoid performance degradation of automatic speech recognizers (ASR) due to interferences such as background noise, many speech enhancement processes have been applied to robot audition systems [K. Nakadai, et al, “An open source software system for robot audition HARK and its evaluation,” in 2008 IEEE-RAS Int'l Conf. on Humanoid Robots (Humanoids 2008) IEEE, 2008; J. Valin, et al, “Enhanced robot audition based on microphone array source separation with post-filter,” in IROS2004. IEEE/RSJ, 2004, pp. 2123-2128; S. Yamamoto, et. al, “Making a robot recognize three simultaneous sentences in real-time,” in IROS2005. IEEE/RSJ, 2005, pp. 897-892; and N. Mochiki, et al, “Recognition of three simultaneous utterance of speech by four-line directivity microphone mounted on head of robot,” in 2004 Int'l Conf. on Spoken Language Processing (ICSLP2004) 2004, p. WeA1705o.4.]. Speech enhancement processes require noise spectrum estimation.
For example, the Minima-Controlled Recursive Average (MCRA) method [I. Cohen and B. Berdugo, “Speech enhancement for non-stationary noise environments,” Signal Processing, vol. 81, pp. 2403-2481, 2001.] is employed for noise spectrum estimation. MCRA tracks the minimum level spectra and judges whether the current input signal is voice active or not (inferring noise) based on the ratio of the input energy and the minimum energy after applying a consequent thresholding operation. This means that MCRA implicitly assumes that the minimum level of the noise spectrum does not change. Therefore, if the noise is not steady-state and the minimum level changes, it is very difficult to set the threshold parameter to a fixed value. Moreover, even if a fine tuned threshold parameter for a non-steady-state noise works properly, the process will fail easily for other noises, even for usual steady-state noises.
Thus, to carry out a speech enhancement process by appropriately setting parameters for noise environment changes has been difficult.
In other words, a noise power estimation system, a noise power estimating method, an automatic speech recognition system and an automatic speech recognizing method that do not require a level based threshold parameter and have high robustness against noise environment changes have not been developed.
Accordingly, there is a need for a noise power estimation system, a noise power estimating method, an automatic speech recognition system and an automatic speech recognizing method that do not require a level based threshold parameter and have high robustness against noise environment changes.