A noise suppressor (noise suppression system), which is a system for suppressing noise superposed upon a desired sound signal, operates, as a rule, so as to suppress the noise coexisting in the desired sound signal by employing an input signal converted in a frequency region, thereby to estimate a power spectrum of a noise component, and subtracting this estimated power spectrum from the input signal. Successively estimating the power spectrum of the noise component enables the noise suppressor to be applied also for the suppression of non-constant noise. There exists, for example, the technique described in Patent document 1 as a noise suppressor.
In addition hereto, there exists the technique described in Non-patent document 1 as a technique realizing a reduction in an arithmetic quantity.
These techniques are identical to each other in a basic operation. That is, the above technique is for converting the input signal into a frequency region with a linear transform, extracting an amplitude component, and calculating a suppression coefficient frequency component by frequency component. Combining a product of the above suppression coefficient and amplitude in each frequency component, and a phase of each frequency component, and subjecting it to an inverse conversion allows a noise-suppressed output to be obtained. At this time, the suppression coefficient is a value ranging from zero to one (1), the output is completely suppressed, namely, the output is zero when the suppression coefficient is zero, and the input is outputted as it stands without suppression when the suppression coefficient is one (1). An estimated value of the noise is employed for calculating the suppression coefficient together with the input signal. There exist various techniques for estimating the noise. For example, the weighted noise estimation technique disclosed in the above-mentioned Patent document can be employed. However, the conventional noise estimation technique including the weighted noise estimation, which involves an averaging operation in one part of its estimation, is not capable of estimating the shock noise such as key typing noise.
On the other hand, the method of suppressing the key typing noise by specializing application for a personal computer and employing press-down information and release information of the key is disclosed in Non-patent document 2. This method is a method of predicting an input signal intensity in a specific region of a time/frequency plane, and determining that the signal is key typing noise when a difference between the obtained prediction value and the actual intensity is large on the assumption that the signal other than the key typing noise does not change drastically in terms of time/frequency. At this moment, so as to enhance a detection precision of the key typing noise, both of the press-down information and the release information of the key are used together.
A configuration of the noise suppressor disclosed in the Non-patent document 2 is shown in FIG. 34. A degraded sound signal (signal in which the desired signal and the shock noise coexist) supplied as a sample value sequence to an input terminal 1 of FIG. 34, which is subjected to the transformation such as a Fourier transform in a conversion unit 2, is divided into a plurality of frequency components, and is supplied to a shock noise detection unit 18 and a shock noise suppression unit 19. The key release information and the key press-down information are supplied to the shock noise detection unit 18 from input terminals 91 and 92, respectively. The shock noise detection unit 18 detects the key typing noise by employing a difference between the predicted value and the actual value of the input signal intensity in the specific region of the time/frequency plane. At first, the shock noise detection unit 18 calculates amplitude of the current frame with a linear prediction using the amplitude of the just-before frame and the frames before it. Continuously, it calculates a sound likelihood that is founded upon a difference between the predicted amplitude and the actual amplitude. When the key press-down information or the key release information is conveyed from the input terminal 92 or the input terminal 91, the shock noise detection unit 18 defines an existence probability of the shock noise in the frame of which the sound likelihood is smallest, out of a plurality of the frames existing before and after the current frame, to be 1. The shock noise detection unit 18 defines the existence probability of the shock noise in the frames other than it, and the frames to which the key press-down information or the key release information has not notified to be 0 (zero). The existence probability of the shock noise is supplied to the shock noise suppression unit 19.
The shock noise suppression unit 19 calculates the amplitude for the frame of which the existence probability of the shock noise is 1 with a statistical technique by employing the amplitude of the just-before frame and the just-after frame, and outputs it as amplitude of the emphasized sound. By locally performing the calculation of the averaging and the dispersion for s statistical model being used, and adaptably controlling these values, a precision of the estimated amplitude can be improved. The specific calculation procedure is disclosed in the Non-patent document 2, so its explanation is omitted. Nothing is done for the frame of which the shock noise existence probability is 0, and the amplitude of the inputted degraded-sound is conveyed as amplitude of the emphasized sound as it stands to an inverse conversion unit 3. The inverse conversion unit 3 inverse-converts the power spectrum of the shock noise suppression sound supplied from the shock noise suppression unit 19, and the phase of the degraded sound supplied from the conversion unit 2 in all, and supplies it to an output terminal 4 as an emphasized sound signal sample.
Patent document 1: JP-P2002-204175A
Non-patent document 1: PROCEEDINGS OF ICASSP, Vol. 1, pp. 473 to 476, May, 2006
Non-patent document 2: PROCEEDINGS OF ICSLP, pp. 261 to 264, September, 2006