Ideally, a handset is held with the microphone near the user's mouth and the speaker near the user's ear. Often, particularly with cellular telephones, the positioning of the microphone is far from ideal, allowing the microphone to pick up extraneous and interfering sounds.
In many speech enhancement or noise reduction algorithms, it is often necessary to detect desired speech in the presence of interfering sounds. Conventional voice activity detectors are not capable of distinguishing desired speech from interfering signals that resemble speech. Techniques that use spatial statistics can detect desired speech in the presence of various types of interfering sounds. Spatial statistics require more than one microphone to achieve the best performance. For example, a second microphone is located at the end of the handset with the speaker but pointing away from the speaker to avoid feedback.
FIG. 1 illustrates cellular telephone 10 having display 12 and keypad 13 in a folding case that closes about hinge 15. Microphone 17 is located at one end of cellular telephone 10 and speaker 18 is located at the opposite end of cellular telephone 10, much like the handsets of earlier telephones. Second microphone 19 is located on the outside of the case, pointing away from speaker 18, forming an array of two microphones with microphone 18.
Microphone 17 is a near-field microphone and microphone 19 is a far-field microphone. Microphone 17 and speaker 18 are lie on axis 21 of cellular telephone 10. FIG. 2 is a profile of a person's head. Axis 22 intersects the ear canal and the mouth. Axes 21 and 22 are parallel to each other during a call, with speaker 18 (FIG. 1) located near the ear canal. With the axes thus aligned, the inter-microphone level difference is large. Unfortunately, cellular telephones are not always positioned in this manner. The near-field microphone is often shifted off axis by 60° or more. When this happens, the microphones in the array are approximately equidistant from the mouth of the user. Sound from the user is incident upon both microphones at approximately the same time and approximately the same amplitude. The near-field microphone may also be moved out of the plane of the figure, further increasing the distance to the mouth of a user.
Using plural microphones, it is possible to estimate the direction of arrival of any sound incident on the array. If the direction of arrival range of a desired sound is known, then the direction of arrival estimate is a powerful statistic that can be used to detect the presence of this desired signal. Speech enhancement or noise reduction algorithms can aggressively remove interfering signals that are not arriving within the acceptance angle of the array.
If the acceptance angle of the array is wide, then the control derived using the direction of arrival estimate may not enhance a speech enhancement or noise reduction algorithm. In a situation like this, it is desirable to use statistics other than direction of arrival estimate to get better performance.
If the source of the interfering sounds and the source of the desired speech are spatially separated, then one can theoretically extract a clean speech signal from interfering sounds. A spatial separation algorithm needs more than one microphone to obtain the information that is necessary to extract the clean speech signal. Many spatial domain algorithms have been widely used in other applications, such as radio frequency (RF) antennas. The algorithms designed for other applications can be used for speech but not directly. For example, algorithms designed for RF antennas assume that the desired signal is narrow band whereas speech is relatively broad band, 0-8 kHz.
Inter-Microphone Level Difference (IMD)
The power of acoustic waves propagating in a free field outward from a source will decrease as a function of distance, r, from the center of source. Specifically, the power is inversely proportional to square of the distance. It is known from acoustical physics that the effect of r2 loss becomes insignificant in a reverberant field.
If a dual microphone array is in the vicinity of the source of desired signal, then the r2 loss phenomenon can be exploited by comparing signal levels between far and near microphones. The inter-microphone level difference can distinguish a near-field desired signal from a far-field directional signal or a diffuse-field interfering signal, if the near-field signal is sufficiently louder than the others; e.g. see U.S. Pat. No. 7,512,245 (Rasmussen et al.).
As the distance increases from an acoustic source to a microphone, the reverberant sounds are comparable in magnitude to the direct path sounds. Measured propagation loss will not truly represent the direct path inverse square law loss. Similarly, inter-microphone level difference increases with increasing spacing of the microphones, which means that the statistic is often insufficient for compact cellular telephones.
It has been found that the inter-microphone level difference does not clearly detect the presence of near-field sounds in the presence of a far-field directional sound or when the axis is offset by more 45°. Thus, inter-microphone level difference alone is not a good statistic to decide whether or not the sounds incident on the microphone array include a near-field sound.
In view of the foregoing, it is therefore an object of the invention to provide a reliable indication of near-field sounds to improve speech enhancement or noise reduction.
Another object of the invention is to improve the reliability of inter-microphone level difference as an indicator of near-field sounds.
A further object of the invention is to provide statistics for reliably detecting near-field sounds in the presence of either a far-field directional sound or a diffuse-field sound.
Another object of the invention is to provide a process and apparatus for exaggerating far-field directional signals or diffuse-field signals to improve near-field detection.
A further object of the invention is to provide a process and apparatus for detecting a near-field sound when the near-field sound is corrupted by either a far-field directional sound or a diffuse-field sound.
Another object of the invention is to provide improved near-field detection when a microphone array is positioned off-axis.