1. Field of Invention
This disclosure relates generally to methods and apparatus for noise level/spectrum estimation and speech activity detection and more particularly, to the use of a probabilistic model for estimating noise level and detecting the presence of speech.
2. Description of Related Art
Communication technologies continue to evolve in many arenas, often presenting newer challenges. With the advent of mobile phones and wireless headsets one can now have a true full-duplex conversation in very harsh environments, i.e. those having low signal to noise ratios (SNR). Signal enhancement and noise suppression becomes pivotal in these situations. The intelligibility of the desired speech is enhanced by suppressing the unwanted noisy signals prior to sending the signal to the listener at the other end. Detecting the presence of speech within noisy backgrounds is one important component of signal enhancement and noise suppression. To achieve improved speech detection, some systems divide an incoming signal into a plurality of different time/frequency frames and estimate the probability of the presence of speech in each frame.
One of the biggest challenges in detecting the presence of speech is tracking the noise floor, particularly the non-stationary noise level using a single microphone/sensor. Speech activity detection is widely used in modern communication devices, especially for modern mobile devices operating under low signal-to-noise ratios such as cell phones and wireless headset devices. In most of these devices, signal enhancement and noise suppression are performed on the noisy signal prior to sending it to the listener at the other end; this is done to improve the intelligibility of the desired speech. In signal enhancement/noise suppression a speech or voice activity detector (VAD) is used to detect the presence of the desired speech in a noise contaminated signal. This detector may generate a binary decision of presence or absence of speech or may also generate a probability of speech presence.
One challenge in detecting the presence of speech is determining the upper and lower bounds of the level of background noise in a signal, also known as the noise “ceiling” and “floor”. This is particularly true with non-stationary noise using a single microphone input. Further, it is even more challenging to keep track of rapid variations in the noise levels due to the physical movements of the device or the person using the device.