The present invention relates to method, apparatus and program for signal processing that realizes a function of suppressing a noise superposed over a desired voice signal, and more particularly to method, apparatus and program for signal processing for performing suppression at a position close to a reproducing device such as a speaker.
Conventionally, a noise suppressor (noise suppression system) is a system for suppressing a noise superposed over a desired voice signal, and in general, it operates to suppress a noise mixed in a desired voice signal by estimating a power spectrum of a noise component using an input signal converted into a frequency domain, and subtracting the estimated power spectrum from the input signal. By estimating the power spectrum of a noise component in a continuous manner, it can be applied to suppression of a non-stationary noise. One noise suppressor is of a scheme described in Patent Document 1 (JP-P2002-204175A), for example.
Another noise suppressor as an implementation having reduced computational complexity is of a scheme described in Non-Patent Document 1 (Proceedings of ICASSP, Vol. I, pp. 473-476, May, 2006.
These schemes have the same basic operation. In other words, an input signal is converted into a frequency domain with linear transform; an amplitude component is extracted; and a suppression coefficient is calculated for each frequency component. Then, a product of the suppression coefficient and amplitude for each frequency component and a phase of the frequency component are combined and inversely converted to obtain a noise-suppressed output. At that time, the suppression coefficient has a value between zero and one, where a suppression coefficient of zero represents complete suppression and results in a zero-output, and a suppression coefficient of one causes the input to be output as it is without suppression.
The most common application for the noise suppressor is in cell phone communication, as shown in FIG. 29. A transmitter terminal 7000 is comprised of a noise suppressor 710, an encoder 720, and a transmitter 730. The noise suppressor 710 is supplied with an input signal via an input terminal 700. In a common cell phone, the input terminal 700 is supplied with a signal picked up by a microphone (microphone signal). The microphone signal is composed of a voice itself and a background noise, and the noise suppressor 710 suppresses only the background noise while keeping the voice as intact as possible, and transmits the noise-suppressed voice to the encoder 720. The encoder 720 encodes the noise-suppressed voice supplied from the noise suppressor 710 based on an encoding scheme such as CELP. The encoded information is transferred to the transmitter 730 and subjected to modulation, amplification, etc., and thereafter is supplied to a transmission path 800. That is, the transmitter terminal 7000 applies a noise suppressor, then performs processing such as voice encoding, and sends the signal to the transmission path.
A receiver terminal 9000 is comprised of a receiver 930 and a decoder 920. The receiver 930 demodulates a signal received from the transmission path 800, digitizes it, and then transfers it to the decoder 920. The decoder 920 decodes the signal received from the receiver 930, and transfers an audible signal to an output terminal 900. The signal obtained at the output terminal 900 is supplied to a speaker for reproduction as an acoustic signal.
In noise suppression with one input, generally there is a tradeoff between a residual noise and output distortion, and a low residual noise is not concomitant with low output distortion. Moreover, the most comfortable combination of residual noise and output distortion is different from user to user, so that it is impossible to preset audio quality that satisfies a plurality of users. Accordingly, noise suppression is sometimes done while avoiding an increase of output distortion due to excessive suppression and tolerating a certain degree of residual noise. Moreover, to improve encoding efficiency in a signal segment containing no voice, the encoder 720 in the transmitter terminal 7000 sometimes has a discontinuous transmission (DTX) function, by which only the background noise level is encoded with a smaller amount of information. In this case, the decoder 920 in the receiver terminal 9000 has a function of generating a noise according to the transmitted background noise level (comfort noise) (CNG).
However, the conventional configuration described with reference to FIG. 29 does not allow a user to operate the noise suppressor 710 because it is placed temporally and spatially remote from the user. Accordingly, when a high residual noise is present due to the noise suppressor 710 or the function of the noise suppressor 710 is disabled in the configuration disclosed in FIG. 29, there arises a problem that a user of the receiver terminal 9000 should catch a low-quality voice having a high background noise. Moreover, there is another problem that some users may hear an objectionable noise due to CNG because too high a level of CNG is made by the decoder 920.