Conventional adaptive noise suppression algorithms have been around for some time. These conventional algorithms have used two or more microphones to sample both an (unwanted) acoustic noise field and the (desired) speech of a user. The noise relationship between the microphones is then determined using an adaptive filter (such as Least-Mean-Squares as described in Haykin & Widrow, ISBN#0471215708, Wiley, 2002, but any adaptive or stationary system identification algorithm may be used) and that relationship is used to filter the noise from the desired signal.
Most conventional noise suppression systems currently in use for speech communication systems are based on a single-microphone spectral subtraction technique first develop in the 1970's and described, for example, by S. F. Boll in “Suppression of Acoustic Noise in Speech using Spectral Subtraction,” IEEE Trans. on ASSP, pp. 113-120, 1979. These techniques have been refined over the years, but the basic principles of operation have remained the same. See, for example, U.S. Pat. No. 5,687,243 of McLaughlin, et al., and U.S. Pat. No. 4,811,404 of Vilmur, et al. There have also been several attempts at multi-microphone noise suppression systems, such as those outlined in U.S. Pat. No. 5,406,622 of Silverberg et al. and U.S. Pat. No. 5,463,694 of Bradley et al. Multi-microphone systems have not been very successful for a variety of reasons, the most compelling being poor noise cancellation performance and/or significant speech distortion. Primarily, conventional multi-microphone systems attempt to increase the SNR of the user's speech by “steering” the nulls of the system to the strongest noise sources. This approach is limited in the number of noise sources removed by the number of available nulls.
The Jawbone earpiece (referred to as the “Jawbone”), introduced by AliphCom of San Francisco, Calif., was the first known commercial product to use a pair of physical directional microphones (instead of omnidirectional microphones) to reduce environmental acoustic noise. The technology supporting the Jawbone is currently described under one or more of U.S. Pat. No. 7,246,058 by Burnett and/or U.S. patent application Ser. Nos. 10/400,282, 10/667,207, and/or Ser. No. 10/769,302. Generally, multi-microphone techniques make use of an acoustic-based Voice Activity Detector (VAD) to determine the background noise characteristics, where “voice” is generally understood to include human voiced speech, unvoiced speech, or a combination of voiced and unvoiced speech. The Jawbone improved on this by using a microphone-based sensor to construct a VAD signal using directly detected speech vibrations in the user's cheek. This allowed the Jawbone to aggressively remove noise when the user was not producing speech. The current Jawbone implementation also uses a pair of omnidirectional microphones to construct two virtual microphones that are used to remove noise from speech. The omnidirectional microphones are calibrated, that is, they both respond as similarly as possible when exposed to the same acoustic field. Calibration using standard techniques such as artificial mouths in acoustic boxes can be difficult, especially in noisy environments like a factory floor.