1. Field of the Invention
The present invention relates to a system for reducing a noise element from a noise superposed voice signal such as environmental noise, etc., and more specifically to a noise reduction apparatus and a noise reducing method for reducing a noise element from a nonvoice environmental noise superposed voice signal input from a microphone in, for example, a mobile telephone system, an IP phone system, etc., improving a signal-to-noise ratio (SNR), and enhancing the speech communication quality.
2. Description of the Related Art
Recently, digital mobile communications systems such as mobile telephones, etc. have become widespread. In such communications, the communications are commonly established with large environmental noise, and it is important to effectively suppress the noise element contained in a voice signal.
In the above-mentioned noise suppression technology, for example, an input signal on a time axis is converted into a signal on a frequency axis (amplitude spectrum and phase spectrum), a suppression gain is obtained from the background noise estimated by a signal of a nonvoice interval, an amplitude spectrum is suppressed, the phase spectrum and the suppressed amplitude spectrum are restored into a signal on a time axis, thereby eliminating the noise (FIG. 1).
The problem with the above-mentioned conventional technology is described below by referring to the following four documents.
[Nonpatent Document] S. F. Boll, “Suppression of Acoustic Noise in Speech Using Spectral Subtraction”, IEEE Transaction on Acoustics, Speech, and Signal Processing, ASSP-33, vol. 27, pp. 113-120, (1979)
[Patent Document 1] Japanese Patent Publication No. 3269969 “Background Noise Elimination Apparatus
[Patent Document 2] Japanese Patent Publication No. 3437264 “Noise Suppression Apparatus”
[Patent Document 3] Japanese Patent Application Laid-open No. 2002-73066 “Noise Suppression Apparatus and Noise Suppressing Method”
In Nonpatent Document 1, the technology of spectrum subtraction, obtaining suppressed amplitude spectrum by subtracting the amplitude spectrum of the estimated noise from the input amplitude spectrum, is proposed.
In Patent Document 1, an input signal is converted into a signal on a frequency axis, and a suppression gain is calculated based on the signal-to-noise ratio (SNR) calculated from the input signal and the estimated noise. The method of calculating a suppression gain is to empirically set a relational expression between the SNR and the suppression gain.
In Patent Document 2, when the power in the estimated nonvoice interval is small, the suppression level is lowered to avoid the degradation by suppressed voice interval of small power. When the power in the nonvoice interval is large, the suppression level is enhanced to further suppressing the nonvoice interval, thereby more appropriately suppressing the noise in the nonvoice interval.
In Patent Document 3, the power of a voice signal is obtained from the smoothing spectrum power in a voice-recognized interval, and the power of a no-voice signal is obtained from the smoothing spectrum power in a voice-unrecognized interval, thereby calculating the SNR, strongly suppressing noise on the signal portion having a high SNR, and restricting suppression on the portion distorted by suppression.
However, in the above-mentioned conventional technology, when the estimation of the background noise is incorrect, no appropriate suppression gain can be obtained, and the noise-suppressed voice signal is degraded. For example, when much bubble noise (background noise containing human voice) is contained in the background noise, the interval of bubble noise is not determined as a nonvoice interval, and estimated noise is calculated in an interval of constant noise other than the bubble noise. When the power of the constant noise is smaller than the power of the bubble noise, the estimated noise is underestimated in bubble noise interval, thereby causing insufficient suppression, that is, sufficient suppression cannot be realized.
In Patent Document 2, the power in the estimated voice interval is estimated as the maximum value of the short interval power in a long interval without considering the distribution of voice power. When the distribution of voice power changes depending on the characteristic of human voice and the speaking style is not considered, there is the problem that an appropriate suppression coefficient cannot be necessarily calculated. For example, when the distribution of the voice power is widely performed, there is voice having small power although the maximum value of the voice power is large. Therefore, the voice can be degraded if the suppression is too strong.
Thus, since the pure voice power, which is obtained by subtracting the noise element from an input voice signal, is not detected and its distribution is not estimated in the conventional technology, an appropriate suppression gain cannot be calculated when the background noise is mistakenly estimated.