Noise reduction is a technique widely used in speech applications. When a microphone captures human speech and converts the human speech into speech signals for further processing, noise such as background ambient noise, may also be captured along with the desired speech signal. Thus, the overall captured (or observed) signals from microphones may include both the desired speech signal and a noise component. It is usually desirable to remove or reduce the noise component in the observed signal to a specified level prior to any further processing of the human speech.
Human speech captured using a single microphone is commonly referred to as a single-channel speech input. Current art for single-channel noise reduction (the process to remove or reduce the noise component from the single-channel speech input) models an input signal y(t) captured at a microphone as a speech signal x(t) along with an additive noise component v(t), or y(t)=x(t)+v(t), where t is a time index. In practice, y(t) is processed through a series of frames over a time axis. The input signal y(t) sensed by the microphone is transformed into a time-frequency domain representation Y(k, m), where ‘k’ is a frequency index and ‘m’ represents an index for time frames, using time-frequency transformations such as a Short-Time Fourier transform (STFT). Thus, after the transformation, Y(k, m)=X(k, m)+V(k, m). The statistics for the noise component V(k, m) may be estimated during silence periods (or periods when there is no detected human voice activities). To reduce noise, current art applies a noise reduction filter H(k, m) to the input signal Y(k, m). The noise reduction filter H(k, m) is designed to minimize the spectrum energy of the noise component V(k, m) for the current frame m. The current art, which tries to reduce noise based on the current time frame m, implicitly assumes that Y(k, m) is uncorrelated from one frame to another.