1. Field of the Invention
The present invention relates to a microphone system that executes an adaptive signal processing by using signals outputted from two microphones and outputs a speaker's voice signal with the signal to noise ratio improved.
2. Related Art
The technological development of voice recognition systems at present has evolved to such a level that a recognition rate of about 95% can be achieved in an environment that the SN (signal to noise) ratio of more than 15 dB is obtained. However, the conventional voice recognition system has the property that as the SN ratio is lowered by the surrounding noises, the recognition rate sharply decreases. FIG. 16 illustrates the relationship between the SN ratio and the recognition capability of some types of microphones (omni-directional, unidirectional, narrow-directional, AMNOR (Adaptive Microphone-array for Noise Reduction)), in which the relationship between the SN ratio and the recognition rate stays in a zone almost shaped as an S-letter curve 100. As clearly seen in this drawing, the recognition rate sharply decreases as the SN ratio decreases, and it reaches about 50% in an environment where the SN ratio is 0 dB.
Accordingly, inside a car's passenger compartment filled with various noises (engine noise, road noise, pattern noise, whistling noise, etc.) that a running car creates, the deterioration of the foregoing recognition capability is unavoidable. This is a significant problem when incorporating a voice recognition system in a car.
In view of these circumstances, various systems have been proposed which reduce the influence by the surrounding noises on receiving the voice with a high SN ratio, in which can be quoted the high SN ratio voice reception system using plural microphones and digital signal processing as an example. The most simple configuration of such a high SN ratio voice reception system is illustrated in FIG. 17, which uses two microphones. Additionally, there are proposed highly advanced systems, such as the Griffith-Jim type array or the AMNOR.
In FIG. 17, 1 denotes a first microphone, 2 a second microphone, and 3 an adaptive signal processor which receives an error signal e and an output signal x2 from the microphone 2 as the reference signal, and executes the adaptive signal processing on the basis of the LMS (Least Mean Square) algorithm so as to minimize the power of the error signal e. In the adaptive signal processor 3, 3a signifies an LMS calculator, 3b an adaptive filter with a configuration of the FIR type digital filter, for example. The LMS calculator 3a determines the coefficients of the adaptive filter 3b so as to minimize the power of the error signal e through the adaptive signal processing.
4 signifies a target response setter that receives a signal outputted from the microphone 1 as the target signal to satisfy the causality. When the signal delay time of half the tap length of the adaptive filter 3b is given by d, the target response setter 4 has a delay characteristic of the time d, and flat characteristic (characteristics of the gain 1) in the audio frequency band. That is, the target response setter 4 is provided with the flat frequency response characteristics of the gain 1 as shown in FIG. 18(a), and the impulse response characteristics having the delay time d as shown in FIG. 18(b).
Returning to FIG. 17, 5 signifies a subtracter that subtracts an output signal from the adaptive filter 3b from a target response outputted from the target response setter 4, and outputs the error signal e.
During the non-recognition of a voice, the microphones 1, 2 receive only noises, and the adaptive signal processor 3 determines the filter coefficient W so as to minimize the power, namely, the noise output of the error signal e. On the other hand, during the recognition of a voice, the adaptive signal processor 3 does not update the filter coefficient, and sets the filter coefficient W determined during the non-recognition of a voice to the adaptive filter 3b to output a voice signal.
The ideal characteristic desired for the system shown in FIG. 17 is to output only a voice signal Xs(z) (zero noise output) during the recognition of a voice. In other words, with regard to a noise output En(z), when giving the following expression:En(z)=Xn1(z)z−d−Xn2(z)·W(z)  (1)
by determining the adjustable parameters (coefficient W of the adaptive filter 3b) so as to minimize the power of the error signal e, to realize the following expression (2) is the ideal condition to obtain.Es(z)=Xs1(z)z−d−Xs2(z)W(z)≈Xs(z)  (2)
Here, Xn1(z), Xn2(z) are the noises contained in the output signals from the microphones 1, 2, and given that the propagation characteristics from a noise source (noise=xn) to the first and second microphones 1, 2 are CN1, CN2,
Xn1(z)=CN1·xn
Xn2(z)=CN2·xn
expression (1) is reduced to the following.En(z)=(CN1·z−d−CN2·W(z))xn  (1′)
Further, Xs1(z), Xs2(z) are the voice signals contained in the output signals from the microphones 1, 2, and given that the propagation characteristics from the mouth of a speaker (speaker's voice=xs) to the first and second microphones 1, 2 are CS1, CS2,
Xs1(z)=CS1·xs
Xs2(z)=CS2·xs
expression (2) is reduced to the following.Es(z)=(CS1·z−d−CS2·W(z))xs  (2′)
Here, considering the actual conditions in a car passenger compartment, there are many noise sources and the coherence of the noises in the car that the microphones 1, 2 pick up is inclined to decrease, as the distance between the microphones 1, 2 is set larger. Accordingly, as the two microphones 1, 2 are moved further apart, the noise output expressed by the equation (1) becomes greater, so that the microphones 1, 2 need to be laid out as close together as possible.
However, if they are laid out as close together as possible, the two microphones 1, 2 will likely receive the voice and noise having virtually the same level and components. If the noise is eliminated by the adaptive filter coefficient W determined in the optimum condition to remove the noise, even the voice will be eliminated. However, if the adaptive filter coefficient W is determined so as to satisfy the expression (2), the voice will not be damaged, but on the other hand, the noise will hardly be eliminated either and the SN ratio will hardly be improved, which is a problem to be solved.
Thus, in pursuit of achieving the maximum suppression of the noises, it is desirable to lay out the two microphones adjacently. On the other hand, in order to minimize the suppression of the voice, it is desirable that the two microphones are separated far from each other. Both of the two conditions cannot be satisfied at the same time. Therefore, in the conventional microphone system, the SN ratio of the voice signal cannot be improved significantly, which is disadvantageous.