1. Field of the Invention
The present invention is directed to audio conferencing systems, and more particularly to a method of beamformer design that equalizes the amount of acoustic coupling among a finite number of beams covering a desired spatial span while preserving directivity characteristics.
2. Description of the Related Art
Spatial directivity in audio conferencing systems can be achieved either through directional microphones or through proper combination of several omni-directional microphones (referred to as microphone array technology).
Beamforming may be used in a microphone array to discriminate a source position in a “noisy” environment by “weighting” or modifying the gain of the signal from each microphone to create a beam in a desired “look” direction toward the source (i.e. talker).
For full-duplex operation, acoustic echo cancellation must be performed to prevent reverberation, howling, etc. (see M. Branstein and D. Ward, “Microphone Arrays. Signal Processing Techniques and Applications”. Springer Verlag, 2001, and H. Buchner, W. Herbordt, W. Kellermann, “An Efficient Combination of Multi-Channel Acoustic Echo Cancellation With a Beamforming Microphone Array”, Proc. Int. Workshop on Hands-Free Speech Communication (HSC), pp. 55-58, Kyoto, Japan, April, 2001). One approach is to perform acoustic echo cancellation on all the microphone signals in parallel, which is computationally intensive. A second approach is to perform acoustic echo cancellation on the spatially filtered signal at the output of the beamformer (i.e. the output signal of the particular microphone facing the “look direction” at any given point in time).
The challenge that this second approach presents to acoustic echo cancellation is accommodating variations in the characteristics of the directional signal that vary with the spatial area that the system is pointing to. For example, the acoustic echo-path as well as the room characteristics (background noise, etc) may change suddenly as the system changes its look direction, for instance when switching to a different talker. As a result, the acoustic echo cancellation algorithm re-converges to the new characteristics (for instance new echo path) each time the system changes its look direction. These transitions result in under-performance of the system in terms of acoustic echo cancellation.
When a microphone array is disposed within a physically asymmetrical enclosure, variations in the acoustic echo path for different “look” directions can be so significant that the acoustic echo canceller cannot provide reasonable performance without special design enhancements to trace such sudden echo path variations.
One method has been proposed in Canadian Patent Application No. 2,413,217 to deal with the effects of the problem set forth above by saving to (and retrieving from) memory the information that characterizes each of a finite number of look directions, or regions of focus, that cover the entire spatial span of the system. Each time a change in the look direction occurs, the system saves the workspace with essential acoustic characteristics captured by the full-duplex acoustic echo cancellation algorithm in the current sector. It also retrieves from memory the corresponding workspace for each new region of focus (captured the last time the sector as used). The acoustic echo cancellation then takes place for the new region of focus with the retrieved information.
This method reduces negative effects on echo cancellation due to variations in the acoustic echo path and room characteristics when the beams are switching from one look direction to other. However, even with this approach it is desirable that the various beamformers covering the whole angular span of the product present similar characteristics in terms of echo cancellation. Fewer differences between the beamformers, results in a more precise estimation of the acoustic signal characteristics (thereby improving the quality of echo cancellation) and less information being required to reside in the workspaces saved to and retrieved from memory (thereby resulting in code and data memory savings).
One method to reduce the variations in the acoustic characteristics for the different sectors is to design the beamformers such that all sectors have the same response to the direct path and main energy component of the acoustic coupling; that is, the loudspeaker signal. This can be achieved through proper beamformer design. Techniques are known for designing beamformers under desired response constraints whereby a linear constraint is imposed to provide the same value of the response to the loudspeaker signal for all beamformers (i.e. all combinations of beamformer weights applied to the microphone signals). For example, see Barry D. Van Veen and Kevin M. Buckley, “Beamforming: a versatile approach to spatial filtering”, IEEE ASSP magazine, April 1988, and James G Ryan. “Near-field beamforming using microphone arrays”, PhD thesis, Carleton University, November 1999.
One classical formulation of beamformer design is the Minimum-Variance formulation. In this approach, for each frequency v of interest, the frequency-domain beamformer may be expressed as a complex weight vector W(v) of length M (where M is the number of microphones used). The response of the beamformer to a signal S at the frequency v is then written asBF(S,v)=WH(v)S(v),where WH(v) denotes the Hermitian transpose (or complex conjugate transpose) of W(v).
The Minimum-Variance-Distortionless-Response (MVDR) formulation of the optimization problem is as follows:
      Min    W    ⁡      (                            W          H                ⁡                  (          v          )                    ⁢                          ⁢              R        ⁡                  (          v          )                    ⁢                          ⁢              W        ⁡                  (          v          )                      )  subject to the constraint WH(v)S(v)=1 where R(v) is the noise correlation matrix. This optimization problem has the following explicit solution:
            (              MVDR_        ⁢        1            )        ⁢                  ⁢          W      ⁡              (        v        )              =                              R                      -            1                          ⁡                  (          v          )                    ⁢                          ⁢              S        ⁡                  (          v          )                                              S          H                ⁡                  (          v          )                    ⁢                          ⁢                        R                      -            1                          ⁡                  (          v          )                    ⁢                          ⁢              S        ⁡                  (          v          )                    
In terms of an efficient time-domain implementation of the beamformer, a FIR filter can be designed to approximate the frequency response of the beamformer weights for each microphone, as explained in James G Ryan. “Near-field beamforming using microphone arrays”. PhD thesis, Carleton University, November 1999, referred to above.
There is one linear constraint in the MVDR formulation: WH(v)S(v)=1 that guarantees distortionless response.
In the case of several linear constraints, the constraints may be written asWH(v)C(v)=G(v),where C(v) is the constraint matrix (size M by K where K<M is the number of constraints) and G(v) is the constraint response vector (row vector of size K). The explicit solution is then given by the following formula:(MVDR—2)W(v)=R−1(v)C(v)[CH(v)R−1(v)C(v)]−1GH(v)
The above-described approach can be used to ensure a “null” response in a certain direction (for example a direction of interference). It can also be applied to the problem set forth above by equalizing the response of the beamformers to the loudspeaker signal. For that, one can constrain the response of each of J beamformers to be equal to a given arbitrary value “g”, chosen a-priori.
Letting Sj(v), 1≦j≦J be the “look direction” in connection with which the j'th beamformer is to give distortionless response, and {tilde over (S)}(v) be the loudspeaker signal, then, for each individual beamformer weights vector Wj(v), the constraints may be written as:WjH(v)Cj(v)=Gj(v),whereCj(v)=[Sj(v) {tilde over (S)}(v)]is the constraint matrix (M rows and 2 columns) andGj(v)=[1 g]is the constraint response vector. The solution, for each “sector” j, is then given by formula (MVDR—2).
The main drawback of this design method is that the resulting beamformers are highly dependent on the arbitrary choice of the complex coupling response value (g). The choice of the magnitude and/or phase of this value may impose unnecessary stress on the solution of the optimization problem, resulting in a loss of directivity. To address this issue, an iterative procedure can be used to find, for each frequency, the coupling response value, g, such that optimal beamformers are obtained by (MVDR—2). One example of criterion that can be used for the optimization problem is the cumulated Minimum Variance criterion:
      F    ⁡          (              g        ,        v            )        =            ∑              j        =        1            J        ⁢                            W          j          H                ⁡                  (                      g            ,            v                    )                    ⁢                          ⁢                        R          j                ⁡                  (          v          )                    ⁢                          ⁢                        W          j                ⁡                  (                      g            ,            v                    )                    
Such an iterative procedure, however, is computationally expensive and is prone to precision problems associated to the optimization procedure used to find the optimum.