When a specific speech signal or the like is to be recorded under various environments, it is difficult to record only a signal of a target sound using a microphone since there are various noise sources in a peripheral environment, and therefore some sort of noise reduction processing or sound source separation processing is necessary.
As an example where such processing is especially needed, there is an automotive environment. Under an automotive environment, due to the spread of mobile phones, a microphone installed far from speaker's mouth in a car is typical situation of a telephone call using a mobile phone during driving, and thereby speech quality is significantly degraded. In addition, also when speech recognition is performed during driving under an automotive environment, because utterance is performed in similar condition, speech recognition performance is degraded. The current progress in speech recognition technology makes speech recognition much more robust against stationary noise than before. However, the current speech recognition technology has a problem of degradation of recognition performance when a plurality of speakers speaks simultaneously. Since the current speech recognition technology does not provide high speech recognition accuracy for a mixed simultaneous speech of two speakers, a passenger other than a speaker is not allowed to speak during the speech recognition device is used. Under this situation, behavior of the passenger is restricted. Although a principal independent component analysis method or the like have applied to separate such sound sources, it is not practical enough because of computational complexity, variations of the number of sound sources, and the like.
To solve the above described problems, various kinds of methods, which use a plurality of microphones installed in a car cabin and record only a speech from a certain direction, have been proposed. However, it is difficult to secure space for installing many microphones in a car cabin. Furthermore, it is also difficult to use microphones having uniform characteristics under the cost restriction. Therefore, a method is desired which allows the number of microphones to be as small as possible and uneven characteristic of the microphones.
Generally, when a plurality of microphones are used, the lower the cost of microphones, the larger the uneven sensitivity characteristics thereof, and it is said that uneven frequency characteristics is about ±3 dB. In processing of an addition type array such as a delay-and-sum array in microphone array technologies, these uneven characteristic only prevents microphone array performance to be achieved as designed. However, in a so-called subtraction type array such as an adaptive array, these uneven characteristic may degrade performance especially in low frequencies less than or equal to about 1 kHz compared to a case where one microphone is used.
Uneven characteristic of microphones as sensors are a critical problem for the microphone array technology. To solve this problem, methods for uniforming sensitivities of a plurality of microphone elements have been proposed in Patent Documents 1 to 5, etc.
Conventionally, as for a microphone array that utilizes an adaptive beamforming processing technology which has a great effect with a few number of microphones, various methods such as a generalized sidelobe canceller (GSC), a Frost type beamformer, and a reference signal method have been known as described in, for example, Non-Patent Documents 1 and 2.
The adaptive beamforming processing is basically a processing to suppress noise by a filter which forms a directional beam having a directional null (blind spot) in a direction of a noise source, and a generalized sidelobe canceller is known to have relatively high performance especially among them. However, the GSC has a problem that a target signal is suppressed and degraded when the target signal is arrived from a direction different from a direction of a specified target signal source. To deal with this problem, Patent Documents 6 and 7 discloses a method which performs processing in a frequency domain so that a computational complexity is reduced, and a direction of a speaker and a direction of a specific noise are successively detected by using filter coefficients in the frequency domain, a target sound is separated from a noise other than the target sound to some extent, and spectral subtraction is applied as well, so that a noise from unknown arrival direction and a diffuse noise are reduced.
[Patent Document 1] JP5-131866A
[Patent Document 2] JP2002-99297A
[Patent Document 3] JP2003-153372A
[Patent Document 4] JP2004-343700A
[Patent Document 5] JP2004-289762A
[Patent Document 6] JP2001-100800A
[Patent Document 7] JP2000-47699A
[Non-Patent Document 1] The Institute of Electronics, Information and Communication Engineers, “Acoustic Systems and Digital Processing”
[Non-Patent Document 2] HAykin, “ADAptive Filter Theory (Prentice HAll)”