1. Field of the Invention
The present invention relates generally to techniques to determine the location of an acoustic source, such as determining a direction to an individual who is talking. More particularly, the present invention is directed towards using two or more pairs of microphones to determine a direction to an acoustic source.
2. Description of Background Art
There are a variety of applications for which it is desirable to use an acoustic technique to determine the approximate location of an acoustic source. For example, in some audio-visual applications it is desirable to use an acoustic technique to determine the direction to the person who is speaking so that a camera may be directed at the person speaking.
The time delay associated with an acoustic signal traveling along two different paths to reach two spaced-apart microphones can be used to calculate a surface of potential acoustic source positions. As shown in FIG. 1A, a pair of microphones 105, 110 is separated apart from each other by a distance D. The separation between the microphones creates a potential difference in acoustic path length of the two microphones with respect to the acoustic source 102. For example, suppose acoustic source 102 has a shorter acoustic path length, L1, to microphone 110 compared with the acoustic path length, L2, from acoustic source 102 to microphone 105. The difference in acoustic path length, ΔL=L2−L1, leads, in turn, to an offset in the time of arrival of the two acoustic signals received by each of the microphones 105 and 110. This time delay can be expressed mathematically as: ΔTd=ΔL/c, where ΔTd is the time delay of sound reaching the two microphones, ΔL is the differential path length from the acoustic source to the two microphones, and c is the speed of sound.
A particular time delay, ΔTd, has a corresponding hyperbolic equation defining a surface of potential acoustic source locations for which the differential path length (and hence ΔTd) is constant. This hyperbolic equation can be expressed in the x-y plane about the center line connecting a microphone pair as:x2/a2−y2/b2=1 where a=ΔTd/2, b is the square root of ((D/2c)2−a2), and D is the microphone separation of the microphone pair. Beyond a distance of about 2D from the midpoint 114 between the microphones, the hyperboloid for a particular ΔTd can be approximated by an asymptotical cone 116 with a fixed angle θ, as shown in FIG. 1B. The axis of the cone is co-axial with the line between the two microphones of the pair.
The cone of potential acoustic source locations associated with a single pair of spaced-apart microphones typically does not provide sufficient resolution of the direction to an acoustic source. Additionally, a single cone provides information sufficient to localize the acoustic source in only one dimension. Consequently, it is desirable to use the information from two or more pairs of microphone pairs to increase the resolution.
One conventional method to calculate source direction is the so-called “cone intersection” method. As shown in FIG. 2, four microphones may be arranged into a rectangular array of microphones consisting of a first pair of microphones 105, 110 and a second orthogonal pair of microphones 130 and 140. For each pair of microphones, a single respective cone 240, 250 of potential acoustic source locations is calculated. The cones intersect along two regions, although in many applications one of the intersection regions may be eliminated as an invalid solution or an algorithm may be used to eliminate one of the intersecting regions as an invalid intersection. The valid geometrical intersection of the two cones is then used to calculate a bearing line 260 indicating the direction to the acoustic source 102.
The cone intersection method provides satisfactory results for many applications. However, there are several drawbacks to the cone intersection method. In particular, the cone-intersection method is often not as robust as desired in applications where there is substantial noise and reverberation.
The intersection of cones method requires an accurate time delay estimate (TDE) in order to calculate parameters for the two cones used to calculate the bearing vector to the acoustic source. However, conventional techniques to calculate TDEs from the peak of a correlation function can be susceptible to significant errors when there is substantial noise and reverberation.
Conventional techniques to calculate the cross-correlation function do not permit the effects of noise and reverberation to be completely eliminated. For a source signal s(n) propagating through a generic free space with noise, the signal xi(n) acquired by the ith microphone has been traditionally modeled as follows:xi(n)=gi*s(n−τi)+ξ(n) where αi is an attenuation factor due to propagation loss, τi is the propagation time and ξi(n) is the additive noise and reverberation. Reverberation is the algebraic sum of all the echoes and can be a significant effect, particular in small, enclosed spaces, such as office environments and meeting rooms. There are several techniques commonly used to calculate the cross-correlation of the two signals of ach microphone pair. The classical cross-correlation (CCC) function for each microphone pair, Cij, can be expressed mathematically as             C      12        ⁡          (      τ      )        =                              x          1                ⁡                  (          n          )                    *                        x          2                ⁡                  (          n          )                      =                  ∑        n            ⁢                           ⁢                                    x            1                    ⁡                      (            n            )                          ⁢                                            x              2                        ⁡                          (                              n                -                τ                            )                                .                    This is equivalent to C12(τ)=F−1{X1(ƒ)X2*(ƒ)}, where F denotes the Fourier transform. CCC requires the least computation of commonly used correlation techniques. However, in a typical office environment, reverberations from walls, furniture, and other objects broadens the correlation function, leading to potential errors in calculating the physical time delay from the peak of the cross-correlation function.
Filtering can improve the accuracy of estimating a TDE from a cross-correlation function. In particular, adding a pre-filter Ψ(ƒ) results in what is known as the generalized cross correlation (GCC) function, which can be expressed as:R12(τ)=F−1{Ψ(ƒ)X1(ƒ)X2*(ƒ)}which describes a family of cross-correlation functions that include a filtering operation. The three most common choices of Ψ(ƒ) are classical cross-correlation (CCC), phase transform (PHAT), and maximum likelihood (ML). A fourth choice, normalized cross correlation (NCC), is a slight variant of CCC. PHAT is a prewhitening filter that normalizes the crosspower spectrum Ψ(ƒ)=1/(|Xi(ƒ)Xj*(ƒ)|) to remove all magnitude information, leaving only the phase.
However, even the use of a generalized cross-correlation function does not always permit an accurate, robust determination of the TDEs used in the intersection of cones method. Referring again to FIG. 2, the intersection of cones method presumes that: 1) the TDE used to calculate the angle of each of the two cones is an accurate estimate of the physical time offset for acoustic signals to reach the two microphones of each pair from the acoustic source; and 2) the two cones intersect. However, these assumptions are not necessarily true. The TDE of each pair of microhones is estimated from the peak of the cross-correlation function and may have a significant error if the cross-correlation function is broadened by noise and reverberation. Additionally, in many real-world applications, there are “blind spots” associated with the fact that there are acoustic source locations for which the two cones do not have an intersection.
Therefore, there is a need for an acoustic location detection technique with desirable resolution that is robust to noise and reverberation.