The present invention relates to noise suppressing devices, noise suppressing methods, and noise suppressing programs. In particular, the present invention relates to a noise suppressing device, a noise suppressing method, and a noise suppressing program that suppress a noise component mixed with the speech signal by performing processing thereon in the frequency domain.
A spectral subtraction (SS) method for subtracting a spectrum of a noise component (noise spectrum) from a spectrum of an input speech signal (input spectrum) is disclosed in S. F. Boll, “Suppression of acoustic noise using spectral subtraction”, IEEE Trans., Acoustics, Speech and Signal Processing, Vol. ASSP-27, No. 2, pp. 113 to 120, April 1979 (referred to as “Non Patent Literature 1” hereinafter).
A minimum mean square error short time spectral amplitude (MMSE-STSA) method for multiplying an input spectrum by spectral gain selected so as to emphasize a speech component is disclosed in Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator”, IEEE ASSP, Vol. ASSP-32, No. 6, pp. 1109 to 1121, December 1984 (referred to as “Non Patent Literature 2” hereinafter).
The methods discussed in Non Patent Literature 1 and Non Patent Literature 2 both require a noise spectrum mixed with an input spectrum. The noise spectrum is separately estimated. The estimated noise spectrum includes an estimation error. Due to the effect of this estimation error, when noise is suppressed in a frequency domain as in the technologies discussed in Non Patent Literature 1 and Non Patent Literature 2, components (isolated frequency components) remain dispersedly along a time axis and a frequency axis in the spectrum (output spectrum) after the suppressing process. These isolated frequency components are perceived by the listener as discordant musical noise.
In order to reduce the aforementioned musical noise, JP 2010-055024A and JP 2010-160246A each disclose a technology for switching between two different noise suppressing methods in accordance with the property of an input spectrum.
The technology discussed in JP 2010-055024A includes section determining means configured to determine whether or not a noise component is dominant in a section, first noise suppressing means configured to collect frequency bands into each group of first group number and to suppress a noise component per each group, and second noise suppressing means configured to collect frequency bands into each group of second group number that is larger than the first group number and to suppress a noise component per each group. If the section determining means determines that “a noise component is dominant”, the noise component is suppressed by the first noise suppressing means. If the section determining means determines that “a noise component is not dominant”, the noise component is suppressed by the second noise suppressing means. Because the first noise suppressing means has a small number of frequency bins to be grouped into a single group (i.e., has coarse frequency resolution), the occurrence of isolated frequency components is prevented. As a result, musical noise can be reduced, but a speech component becomes distorted. On the other hand, because the second noise suppressing means has a larger number of frequency bins to be grouped than the first group number (i.e., has fine frequency resolution), a speech component is less likely to become distorted. However, since isolated frequency components occur, musical noise occurs in a section where a noise component is dominant. Therefore, the technology discussed in JP 2010-055024A switches between these two noise suppressing means in accordance with whether or not a noise component is dominant in a section, so as to reduce both the occurrence of musical noise and the distortion of a speech component.
The technology discussed in JP 2010-160246A includes kurtosis-index-value calculating means configured to calculate a kurtosis index value indicating a degree by which the kurtosis in the intensity distribution of a speech signal (spectrum) has changed before and after a noise suppressing process, first noise suppressing means configured to use the SS method, and second noise suppressing means configured to use the MMSE-STSA method. A kurtosis index value is calculated for each of the first noise suppressing means and the second noise suppressing means, and a noise component is suppressed by the noise suppressing means with the smaller kurtosis index value. In other words, a kurtosis index value has a positive correlation with the amount of musical noise occurring after a noise-component suppressing process. Therefore, the technology discussed in JP 2010-160246A switches between these two noise suppressing means in accordance with a kurtosis index value so as to reduce the occurrence of musical noise.