The following applications and patent(s) are cited and are hereby incorporated by reference: U.S. patent application Ser. No. 09/252,874 filed Feb. 18, 1999, U.S. patent application Ser. No. 09/157,035 now issued U.S. Pat. No. 6,049,607 issued Apr. 11, 2000, U.S. patent application Ser. No. 09/055,709 filed Apr. 7, 1998, U.S. patent application Ser. No. 09/130,923 filed Aug. 6, 1998, U.S. patent application Ser. No. 08/672,899 now issued U.S. Pat. No. 5,825,898 issued Oct. 20, 1998, and International Application No. PCT/U.S.99/21186. And, all documents cited herein are incorporated herein by reference, as are documents cited or referenced in documents cited herein.
The present invention relates to noise cancellation and reduction and, more specifically, to noise cancellation and reduction using sub-band processing and exponential smoothing.
Ambient noise added to speech degrades the performance of speech processing algorithms. Such processing algorithms may include dictation, voice activation, voice compression and other systems. The ambient noise also degrades the sound and voice quality and intelligibility. In such systems, it is desired to reduce the noise and improve the signal to noise ratio (S/N ratio) without effecting the speech and its characteristics.
Near field noise canceling microphones provide a satisfactory solution but require that the microphone be in proximity with the voice source (e.g., mouth). In many cases, this is achieved by mounting the microphone on a boom of a headset which situates the microphone at the end of a boom near the mouth of the wearer. However, headsets have proven to be either uncomfortable to wear or too restricting for operation in, for example, an automobile.
Microphone array technology in general, and adaptive beamforming arrays in particular, handle severe directional noises in the most efficient way. These systems map the noise field and create nulls towards the noise sources. The number of nulls is limited by the number of microphone elements and processing power. Such arrays have the benefit of hands-free operation without the necessity of a headset.
However, when the noise sources are diffused, the performance of the adaptive system will be reduced to the performance of a regular delay and sum microphone array, which is not always satisfactory. This is the case where the environment is quite reverberant, such as when the noises are strongly reflected from the walls of a room and reach the array from an infinite number of directions. Such is also the case in a car environment for some of the noises radiated from the car chassis. Another downside to the array solution is that it requires multiple microphones which has an impact on the physical size of the solution and the price. It also eliminates the capability to provide a noise reduction capability to existing systems that already have one microphone implemented and that can not add additional microphones.
One proposed solution to futher reduce the noise is the spectral subtraction technique that estimates the noise magnitude spectrum of the polluted signal by measuring it during non-speech time intervals detected by a voice switch, and then subtracting the noise magnitude spectrum from the signal. This method, described in detail in Suppression of Acoustic Noise in Speech Using Spectral Subtraction, (Steven F Boll, IEEE ASSP-27 NO.2 Apr. 1979), achieves good results for stationary diffused noises that are not correlated with the speech signal. The spectral subtraction method, however, creates artifacts, sometimes described as musical noise, that may reduce the performance of the speech algorithm (such as voice recording or voice activation) if the spectral subtraction is uncontrolled.
Another problem is that the magnitude calculation of the FFT result is quite complex. This involves square and square root calculations which are very expensive in terms of computation load. Yet another problem is the association of the phase information to the noise free magnitude spectrum in order to obtain the information for the IFFT. This process requires the calculation of the phase, the storage of the information, and applying the information to the magnitude dataxe2x80x94all are expensive in terms of computation and memory requirements. Shortening the length of the FFT results in a wider bandwidth of each bin and better stability but reduces the performance of the system. Averaging-over-time, moreover, smears the data and, for this reason, cannot be extended to more than a few frames.
An improved spectral subtraction technique has been proposed in U.S. patent Ser. No. 09/252,874, filed Feb. 18, 1999. The improved system has a threshold detector that precisely detects the positions of the noise elements, even within continuous speech segments, by determining whether frequency spectrum elements, or bins, of the input signal are within a threshold set according to a minimum value of the frequency spectrum elements over a preset period of time. More precisely, current and future minimum values of the frequency spectrum elements. Thus, for each syllable, the energy of the noise elements is determined by a separate threshold determination without examination of the overall signal energy, thereby providing good and stable estimation of the noise. In addition, the system preferably sets the threshold continuously and resets the threshold within a predetermined period of time of, for example, five seconds.
In order to reduce instability of the spectral estimation, the improved spectral subtraction technique performs a two-dimensional (2D) smoothing process and is applied to the signal estimation. A two-step smoothing function using first neighboring frequency bins in each time frame then applying an exponential time average effecting an average over time for each frequency bin produces excellent results.
In order to reduce the complexity of determining the phase of the frequency bins during subtraction to thereby align the phases of the subtracting elements, the improved technique applies a filter multiplication to effect the subtraction. The filter function, a Weiner filter function for example, or an approximation of the Weiner filter is multiplied by the complex data of the frequency domain audio signal.
However, these spectral subtraction techniques still require complex and computationally intense FFT calculations in order to operate on the data while in the frequency domain. Adding to the computation time is a latency that results while waiting for sufficient data points/samples to buffer prior to performing the calculations. This latency problem results in an overall system delay that can cause difficulties in real-time applications. Also the 2D smoothing process reduces the artifacts (also known as musical noise) but these would still be audible, especially when voice is not present. In quiet sections this residual noise sounds artificial in nature and can be annoying to listen to.
It is therefore an object of this invention to provide a sub-band time domain noise canceling system that has a simple, yet efficient mechanism, to estimate and subtract noise even in poor signal-to-noise ratio situations and in continuous fast speech cases.
It is another object of this invention to provide an efficient mechanism that improves the processing throughput by reducing the latency problem in related art systems.
It is yet another object of this invention to provide an efficient mechanism that removes the residual (musical) noise problem in related art systems.
In accordance with the foregoing objectives, the present invention provides a system that correctly determines the non-speech segments of the audio signal thereby preventing erroneous processing of the noise canceling signal during the speech segments.
To attain the above objectives, the present invention provides an input for inputting a digital signal that includes a noise signal component; a band splitter for dividing the digital input signal into a number of frequency-limited time-domain signal sub-bands; a number of noise processors which correspond to each of the sub-bands such that the noise signal components in the digital input signal are canceled; and a recombiner for recombining the noise processed sub-bands into a digital output signal.
A particular aspect of the present invention is that the input beam is split into a number of frequency-limited sub-bands, preferably 16 evenly spaced bands, by the band splitter such that noise processing is performed on each frequency band separately. By splitting the bands into, for example, 16 channels the present invention reduces the sampling rate needed to be processed by the noise processors. It will be appreciated that, not only is this system much more manageable, the noise processors can be optimized for each frequency separately by, for example, adjusting various thresholding parameters corresponding to expected noise levels within a given band. The band splitter is, for example, a DFT filter bank that uses single side band modulation to divide the digital input signal.
Each noise processor is made up of an exponential averager, a noise estimator, and a subtraction processor. The exponential averager computes a rolling average input value on the basis of a weighted average of the previous average value and the current input value. The noise estimator generates a band noise value by performing an exponential smoothing based on a weighted average of the previous noise value and the current input value. If the current input value, providing that the current input is considered to be noise, is greater than a predetermined multiple of a current minimum value the noise estimator does not use the input to determine the new noise estimation. The subtraction processor generates a filter coefficient H on the basis of the rolling average input value and the band noise value, and multiplies the current input value by the filter coefficient to generate a noise canceled value.
Additionally, the subtraction processor may perform a minimum filter coefficient threshold function. If the calculated value is below a certain minimum this certain minimum is replaced with the actual calculated value. This threshold can be used to control the amount of noise reduction. In addition, if the current input is less that a predetermined multiple of the noise threshold value an exponential smoothing of the filter coefficient is performed.
The present invention is applicable to various noise canceling systems including, but not limited to, those systems described in the U.S. patent applications incorporated herein by reference. The present invention, for example, is applicable with cellular phones, personal digital assistants (PDAs), audio applications, automobile acoustics, headphones, and microphone arrays. In addition, the present invention may be embodied as a computer program for driving a computer processor either installed as application software or as hardware.