As a digital signal processing technology has moved forward in recent years, an operation of making a voice call outdoors using a mobile phone, an operation of making a handsfree phone call in a vehicle, and a handsfree operation using a voice recognition have become popular. Because these devices are used in a high-level noise environment in many cases, background noise is also inputted to a microphone together with a voice, and this causes degradation in the call voice, a reduction in the voice recognition rate, and so on. Therefore, in order to implement a comfortable voice call and a high-accuracy voice recognition, a noise suppression device that suppresses background noise mixed into an input signal is needed.
As a conventional noise suppression method, for example, there is a method of transforming an input signal in a time domain into a power spectrum which is a signal in a frequency domain, calculating a suppression amount for noise suppression by using the power spectrum of the input signal and an estimated noise spectrum which is separately estimated from the input signal, carrying out amplitude suppression on the power spectrum of the input signal by using the acquired suppression amount, and transforming the power spectrum on which the amplitude suppression is carried out and a phase spectrum of the input signal into signals in a time domain to acquire a noise suppression signal (refer to nonpatent reference 1).
While the suppression amount is calculated on the basis of the ratio (referred to as the SN ratio from here on) between the power spectrum of the voice and the estimated noise power spectrum in accordance with this conventional noise suppression method, the suppression amount cannot be calculated correctly when the value of the ratio is negative (expressed in decibels). For example, in a voice signal onto which noise having large power in a low frequency range thereof and occurring when a vehicle is travelling is superimposed, a low-frequency component of the voice is buried in the noise and therefore the SN ratio becomes negative. A problem is that this results in excessive suppression of the low-frequency component of the voice signal, and hence degradation in the voice quality.
To solve the above-mentioned problem, as a method of efficiently extracting a voice signal which is an object signal by using a plurality of microphones (microphone array), thereby implementing high-quality noise suppression even under high-level noise conditions, for example, nonpatent reference discloses a beamforming method and patent reference 1 discloses a voice-collecting device having a function of extracting an object signal.
According to the nonpatent reference 2, a high-quality noise suppression device that uses space information, such as a phase difference occurring when an object signal from a sound source reaches each of microphones, to synthesize signals from the microphones and enhance the object signal, thereby improving the SN ratio between the voice signal which is the object signal and noise, is implemented.
Further, the patent reference 1 discloses, as a technology of extracting an object signal in a noise environment, a method of using a difference in sound field distribution between an object signal and noise to extract a frequency component in which the object signal is dominant on a frequency axis. The method disclosed by this patent reference 1 is subject to the condition that a main input microphone is located close to the sound source of the object signal and an auxiliary input microphone is located at a position distant from the above-mentioned sound source rather than the main input microphone, and the extraction of the frequency component in which the object signal is dominant is implemented while an attention is given to the fact that the characteristics of a level difference occurring between these two microphones differ between noise and the object signal, thereby achieving an improvement in the sound quality.