This invention relates to audio signal processing and, in particular, to a circuit that estimates direction of arrival using plural microphones.
As used herein, “telephone” is a generic term for a communication device that utilizes, directly or indirectly, a dial tone from a licensed service provider. For the sake of simplicity, the invention is described in the context of a telephone but has broader utility; e.g. communication devices that do not utilize a dial tone, such as radio frequency transceivers or intercoms.
This invention finds use in many applications where the internal electronics is essentially the same but the external appearance of the device is different. FIG. 1 illustrates a conference phone or speaker phone such as found in business offices. Telephone 10 includes microphones 11, 12, 13, and speaker 15 in a sculptured case.
FIG. 2 illustrates what is known as a hands free kit for providing audio coupling to a cellular telephone (not shown). Hands free kits come in a variety of implementations but generally include case 16, powered speaker 17 and plug 18, which fits an accessory outlet or a cigarette lighter socket in a vehicle. Case 16 may contain more than one microphone or one of the microphones (not shown) is separate and plugs into case 16. The external microphone is for placement as close to a user as possible, e.g. clipped to the visor in a vehicle. A hands free kit may also include a cable for connection to a cellular telephone or have a wireless connection, such as a Bluetooth® interface. A hands free kit in the form of a head set is powered by internal batteries but is electrically similar to the apparatus illustrated in FIG. 2.
Today, hands free communication has become accepted, even expected, by people unfamiliar with technology. Thus, hands free communication is often attempted in harsh, i.e., noisy, acoustical environments such as automobiles, airports, and restaurants. As used herein, “noise” refers to any unwanted sound, whether or not the unwanted sound is periodic, purely random, or somewhere in between. As such, noise includes background music, voices (herein referred to as “babble”) of people other than the desired speaker, tire noise, wind noise, and so on. Automobiles can be especially noisy environments, which makes the invention particularly useful for hands free kits. Moreover, the noise will often be loud relative to the desired speech. Hence, it is essential to reduce noise in order to improve the quality of a conversation.
Many digital signal processing techniques have been proposed for reducing noise. In products with a single microphone, reducing noise is quite difficult when the desired speech and the noise share the same frequency spectrum. It is difficult for these techniques to remove noise without damaging the desired speech.
If the origin of the noise and the origin of the desired speech are spatially separated, then one can theoretically extract a clean speech signal from a noisy speech signal. A spatial separation algorithm needs more than one microphone to obtain the information that is necessary to extract the clean speech signal. Many spatial domain algorithms have been widely used in other applications, such as radio frequency (RF) antennas. The algorithms designed for other applications can be used for speech but not directly. For example, algorithms designed for RF antennas assume that the desired signal is narrow band. Speech is relatively broad band, 0-8 kHz. Other known algorithms are based on Independent Component Analysis (ICA). Using two or more microphones will improve the noise reduction performance of a hands free kit whether a spatial separation algorithm or an ICA based algorithm is used. The invention is based on a variation of a spatial separation algorithm.
FIG. 3 illustrates a classic spatial separation system in which the signal from a first microphone is filtered in an adaptive filter and subtracted from the signal from a second microphone; e.g. see U.S. Pat. No. 7,146,013 (Saito et al.). A control loop, indicated by the dashed line, adjusts filter parameters for minimal noise.
Because a signal can be analog or digital, a block diagram can be interpreted as hardware, software, e.g. a flow chart, or a mixture of hardware and software. Programming a microprocessor is well within the ability of those of ordinary skill in the art, either individually or in groups.
Those of skill in the art recognize that, once an analog signal is converted to digital form, all subsequent operations can take place in one or more suitably programmed microprocessors. Use of the word “signal”, for example, does not necessarily mean either an analog signal or a digital signal. Data in memory, even a single bit, can be a signal. A signal stored in memory is accessible by the entire system, not just the function or block with which it is most closely associated. Those of skill in the art know that “subtraction” in binary is addition (one number is inverted, incremented, and added to the other). Where the inversion takes place is a matter of design. For this reason, a plus sign is used to represent combining two or more signals.
FIG. 4 illustrates another spatial separation system wherein voice activity detector 31 enables adaptation by filter 32 when voice is detected; e.g. see U.S. Pat. No. 7,218,741 (Balan et al.). FIG. 5 is yet another spatial separation system wherein direction of arrival is used to enable adaptation when sound is detected in the look direction; e.g. see U.S. Pat. No. 7,426,464 (Hui et al.).
An outline of Spatial Separation Algorithms is as follows.                Active Noise Cancellation        Beam Former                    Fixed                            Delay and Sum                Filter and Sum                                    Adaptive            Generalized Side Lobe Cancellation (GSC)                            fixed beam former                blocking matrix                                    delay and subtract beam former                                                plural input adaptive filtersIn FIG. 6, fixed beam former 41 forms a beam towards a look direction. The performance of fixed beam former 41 is not sufficient because of beam width, due to side lobes in the beam. The main objective of GSC is to reduce the side lobe levels, hence the name. The GSC uses blocking matrix 42 that forms a null beam in the look direction. If there is no reverberation, the output of blocking matrix 42 should not contain any signals that are coming from the look direction.                                                
Blocking matrix 42 can take many forms. For example, with two microphones, the signal from one microphone is delayed an appropriate amount to align the outputs in time. The outputs are subtracted to remove all the signals that are coming from the look direction, forming a null. This is also known as a delay and subtract beam former. If the number of microphones is more than two, then adjacent microphones are time aligned and subtracted to produce (n−1) outputs. In ideal conditions, all the (n−1) outputs should contain signals arriving from directions other than the look direction. The (n−1) outputs from blocking matrix 42 serve as inputs to (n−1) adaptive filters to cancel out the signals that leaked through the side lobes of the fixed beam former. The outputs of (n−1) adaptive filters are subtracted from the fixed beam former output in subtraction circuit 43. The filters and subtraction circuit are collectively referred to as multiple input canceller 44.
The outputs of blocking matrix 42 will often contain some desired speech due to mismatches in the phase relationships of the microphones and the gains of the amplifiers (not shown) coupled to the microphones. Reverberation also causes problems. If the adaptive filters are adapting at all times, then they will train to speech from the blocking matrix, causing distortion at the subtraction stage.
Using a voice activity detector for control increases the sensitivity of a system to the quality of the detector. Similarly, using direction of arrival for control places a premium on accurately detecting direction, particularly if combined with voice activity detection. Thus, there is a need in the art for more accurately determining voice and direction.
In view of the foregoing, it is therefore an object of the invention to provide improved noise suppression using plural microphones.
Another object of the invention is to provide a method and apparatus for more accurately determining direction of arrival in a noise suppression circuit.
A further object of the invention is to provide improved control of adaptation in noise suppression circuits.