Voice recognition systems require optimal noise estimation and reduction for distinguishing speech related signal characteristics from noise related signals. Noise can result from environmental sources (such as other speakers, background noises etc.) and/or from the detection system itself (e.g. microphone quality, processing methods and equipment, etc.). Speech detection systems use various methods for distinguishing speech related signals from noise based on audio recording/receiving of speech related acoustic signals (e.g. using an acoustic microphone system for detection of sound).
Two such known methods are Log-Spectral Amplitude (LSA) or optimally modified LSA (OMLSA). LSA estimators minimize the mean square error of the log spectra, based on Gaussian statistical models (see “Speech Enhancement for Non-Stationary Noise Environments”, Israel Cohen and Baruch Berdugo, Signal Processing, vol. 81, pp. 2403-2418, November 2001, referred to hereinafter as Cohen 1, which is incorporated by reference in its entirety to this application). OMLSA is based on the time-frequency distribution of signal-to-noise ratio (SNR) of the detected audio signal.
The minimal Controlled Recursive Averaging (MCRA) noise estimation approach is a method for noise estimation used for speech enhancement or detection, which combines minimum tracking with recursive averaging, such as described in Cohen 1, page 2405. This algorithm uses probability functions for estimating the speech and for controlling adaptation of the noise spectrum by determining the ratio between the local energy of the noisy signal and its minimum within a specified time window. An improved MCRA (IMCRA) is also described in another paper by Israel Cohen (see “Noise Spectrum Estimation in Adverse Environments: Improved Minima Controlled Recursive Averaging”, Israel Cohen,: IEEE Trans. Speech Audio Processing, vol. 11, no. 5, pp. 466-475, September 2003 referred to hereinafter as Cohen 2, which is incorporated by reference in its entirety to this application). “The IMCRA involves averaging past spectral power values, using a time-varying frequency-dependent smoothing parameter that is adjusted by the signal presence probability.” (see Cohen 2, abstract).