1. Field of the Invention
The present invention relates to a microphone array for detecting the direction and the position of a sound source, enhancing a desired signal and suppressing noise by performing signal processing based on signals inputted from arrayed microphones.
2. Description of the Related Art
A microphone array includes a plurality of real microphones connected in an array and processes signals received by the real microphones so that directivity can be provided.
In a microphone array, an SN(signal-to-noise) ratio can be improved by two approaches, namely, enhancement of a desired signal coming from a look direction and suppression of unnecessary noise. A conventional microphone array according to each approach will be described below.
FIG. 25 is a view showing an example of the structure of a conventional microphone array, which is a so- called delay-and-sum array. The delay-and-sum array shown in FIG. 25 includes a plurality of real microphones 2501, a plurality of delay units 2502 corresponding to the respective real microphones and an adder 2503.
The delay-and-sum array enhances a desired signal coming from a look direction by utilizing a time lag generated when a sound wave coming from the look direction reaches the plurality of real microphones. FIG. 26 is a view illustrating enhancement of a desired signal in the delay-and-sum array. In FIG. 26, a sound wave that can be approximated to a plane wave is received at two microphones 2601 and 2602 in a free space. In FIG. 26, a bold arrow denotes a propagation direction of the sound wave, and a broken line denotes a wavefront. The two real microphones 2601 and 2602 are separated by a distance d.
It is assumed that a sound wave comes from a look direction xcex8 and that the signal received at the real microphone 2602 is delayed against the signal received at the real microphone 2601 by a time lag xcfx84 during which the sound wave travels a distance "xgr". This can be expressed by the following equations:
x2(t)=x1(txe2x88x92xcfx84)
xcfx84="xgr"/c=dxc2x7(sin xcex8)/c,
where c represents the velocity of sound. When the signal received at the real microphone 2601 is delayed for a delay period xcfx84, the two received signals that were previously separated by a time lag become in-phase on the time axis. On the other hand, sound waves coming from directions other than the look direction are received at the real microphones with time lags different from the time lag xcfx84, so that the signals are not processed to be in-phase by this delay operation. In other words, the above-described delay operation makes it possible to enhance the desired signal coming from the look direction.
The delay-and-sum array shown in FIG. 25 processes an input signal from each real microphone 2501 to be in-phase with the delay unit 2502, and then the signals are added by the adder 2503, so that the desired signal coming from the look direction can be enhanced.
Next, a conventional microphone array according to the approach of noise suppression will be described. FIG. 27 shows an example of the structure of a microphone array that suppresses noise. The microphone array shown in FIG. 27 is called a subtraction type array. The subtraction type array shown in FIG. 27 includes two real microphones 2701 and 2702, a delay unit 2703, a subtracter 2704, and a desired signal correction filter 2705.
In the subtraction type array, when noise coming only from a direction xcex8 are received at the two microphones 2701 and 2702, the relationship expressed by the equation: x2(t)=x1(txe2x88x92xcfx84) is satisfied. In this case, x1(t) is delayed by time xcfx84 so as to process noise components included in the two received signals to be in-phase as in the case of the delay-and-sum array. Then, the noise that is in-phase is subtracted so that those noise components can be erased.
However, the direction xcex8 of the noise is unknown in many cases. Therefore, the value of xcfx84 is unknown. Then, as shown in FIG. 27, information about an output e(t) from the subtracter 2704 is fed back to the delay unit 2703 so that an amount of delay is adjusted to minimize the power of the output e(t).
If the received signals consist only of noise coming from the direction xcex8, e(t) becomes zero, which is the minimum, when the amount of delay becomes xcfx84. According to this approach, even if a value of xcex8 is unknown, noise can be erased by a subtraction process.
On the other hand, if a desired signal comes from a direction other than the direction xcex8, the desired signals are not processed to be in-phase by the above-described operation. Therefore, the signals of the desired signal cannot be erased by subtraction. The frequency components of the signals of the desired signal, however, are changed by subtraction. Therefore, as shown in FIG. 27, a desired signal correction filter 2705 is provided to correct this change.
When noise comes from a small number of directions, the subtraction type array can provide an effective improvement in the SN ratio, even if the subtraction type array is small.
However, when using the delay-and-sum array or the subtraction type array, it is necessary to increase the number of real microphones in order to improve the enhancement of a desired signal, the suppression of noise and the performance for detecting the position of the sound source, thus causing the problem of upsizing the array.
Therefore, with the foregoing in mind, it is an object of the present invention to provide a compact and high-performance microphone array with a small number of real microphones that can provide substantially the same quality as a microphones array with a large number of real microphones.
In order to achieve the object, a microphone array of the present invention comprises a plurality of real microphones arranged in predetermined positions, at least one virtual microphone, and a sound signal estimator for estimating a sound signal received by the virtual microphone. The sound signal estimator comprises a sound signal divider for dividing, based on sound signals received by the plurality of real microphones, a sound signal received by a predetermined real microphone into components, each component corresponding to one coordinate axis direction in a coordinate system that is defined on the basis of positions of the plurality of real microphones, a sound signal component estimator for estimating a virtual microphone sound signal component corresponding to a predetermined coordinate axis direction in the coordinate system, based on the sound signal received by the predetermined real microphone and the sound signal component corresponding to the predetermined coordinate axis direction divided by the sound signal divider; and a sound signal component adder for adding the sound signal component corresponding to the coordinate axis direction divided by the sound signal divider and the sound signal component, each component corresponding to one coordinate axis direction estimated by the sound signal component estimator.
In one embodiment of the present invention, the microphone array further comprises at least one delay element for performing delay processing to each sound signal so that sound signals received by the plurality of real microphones and sound signals estimated by the sound signal estimator are in-phase; and an adder for adding signals that have been processed by the delay elements. This embodiment makes it possible to enhance a desired signal by using the estimated sound signal. Furthermore, by subtracting the signal that has been processed in the delay element, it is possible to suppress noises by using the estimated signal.
In another embodiment of the present invention, the microphone array further comprises a correlation coefficient calculator for calculating correlation coefficients based on sound signals received by the predetermined real microphone and a sound signal estimated by the sound signal estimator; and a sound source position estimator for estimating a position of a sound source based on the correlation coefficients calculated by the correlation coefficient calculator. Correlation coefficients indicate the correlation between two signals. For example, it is generally known that, by calculating the correlation coefficients between sound signals received by arbitrary two real microphones based on a predetermined equation so as to perform a predetermined process with the calculated results, the position of a source of a desired signal can be estimated. Therefore, the calculation of correlation coefficients of the estimated sound signals makes it possible to estimate the position of the sound source more precisely.
A second microphone array of the present invention including a plurality of real microphones connected in an array comprises a sound signal divider for dividing, based on sound signals received by the plurality of real microphones, a sound signal received by a predetermined real microphone into components, each corresponding to one coordinate axis direction in a coordinate system defined on the basis of the positions of the plurality of real microphones. This embodiment makes it possible to separate voices of two speakers when speaker A exists on one coordinate axis and another speaker B exists in a direction perpendicular to the coordinate axis.
In one embodiment of the second microphone array of the present invention, the microphone array further comprises a sound power calculator for calculating a sound power of a component corresponding to a coordinate axis direction based on the sound signal component corresponding to a coordinate axis direction divided by the sound signal divider; and a sound source direction estimator for estimating a direction of a sound source based on the sound power calculated by the sound power calculator. This embodiment is advantageous, because an angle to a predetermined coordinate axis when the sound source is viewed from the position of the predetermined real microphone can be estimated, based on the ratio of sound powers of sound signal components, each component corresponding to each of the coordinate axis directions.
A third microphone array of the present invention including a plurality of real microphones and at least one virtual microphone comprises a sound signal divider for dividing, based on sound signals received by the plurality of real microphones, a sound signal received by a predetermined real microphone into components, each corresponding to one coordinate axis direction in a coordinate system defined on the basis of positions of the plurality of real microphones; a sound signal component estimator for estimating a virtual microphone sound signal component corresponding to a coordinate axis direction in the coordinate system; a sound power calculator for calculating sound powers of components, each corresponding to a coordinate axis direction of a sound signal received by the real microphone and a virtual microphone sound signal, based on the sound signal component divided by the sound signal divider and the sound signal component estimated by the sound signal component estimator; and a sound source position estimator for estimating a position of a sound source based on the sound powers calculated by the sound power calculator.
The calculation of sound powers of estimated sound signals makes it possible to estimate angles to a predetermined coordinate axis when the sound source is viewed from a plurality of positions. Therefore, the position of the sound source can be estimated in a more limited range.
A fourth microphone array of the present invention including a plurality, of real microphones comprises a rotator for rotating the microphone array; a rotation controller for controlling a rotation angle of the rotator; a correlation coefficient calculator for obtaining the rotation angle of the rotator and calculating correlation coefficients for each angle based on sound signals received by the plurality of real microphones; and a sound source position estimator, for comparing the correlation coefficients calculated by the correlation coefficient calculator for each angle and estimating a position of a sound source based on results of the comparison.
By rotating the microphone array and calculating correlation coefficients for every angle of rotation, it is possible to determine the direction of the source of the desired signal precisely. Therefore, it is possible to enhance the desired signal or suppress noise more precisely, based on sound signals received by the microphone array including a plurality of microphones. Furthermore, it is possible to estimate the direction of the sound source by calculating the ratio of powers instead of the correlation coefficients.
In one embodiment of the fourth microphone array of the present invention, the microphone array further comprises a position detector for detecting a position of the microphone array. The sound source position estimator compares correlation coefficients calculated by the correlation coefficient calculator for every position detected by the sound source position detector and every rotation angle so as to estimate a position of a sound source based on results of the comparison.
A fifth microphone array of the present invention including a plurality of real microphones comprises at least one delay element for performing delay processing to a sound signal received by each of the plurality of real microphone so that sound signals received by the plurality of real microphones are in-phase; an adder for adding signals that have been processed by the delay elements; an image capturer for capturing an image of a sound source; a sound source position detector for detecting a position of the sound source based on an output from the image capturer; and a delay controller for controlling delay processing by the delay element based on the position of the sound source detected by the sound source position detector.
This embodiment including an image capturer for finding the sound source is especially effective in an environment with a high noise level, because the desired signal enhancement process is performed while detecting the position of the sound source. As in the desired signal enhancement process, a noise suppression process is performed while detecting the position of a specific noise source such as a speaker, so that this embodiment is effective to suppress a specific noise, i.e., echo or howling.