1. Field of Invention
The present invention relates generally to audio processing and more particularly to adaptive classification of audio sources.
2. Description of Related Art
Currently, there are many methods for reducing background noise in an adverse audio environment. One such method is to use a noise suppression system that always provides an output noise that is a fixed bound lower than the input noise. Typically, the fixed noise suppression is in the range of 12-13 dB. The noise suppression is fixed to this conservative level in order to avoid producing speech distortion, which will be apparent with higher noise suppression.
In order to provide higher noise suppression, dynamic noise suppression systems based on signal-to-noise ratios (SNR) have been utilized. Unfortunately, SNR, by itself, is not a very good predictor of an amount of speech distortion because of the existence of different noise types in the audio environment and the non-statutory nature of a speech source (e.g., people). SNR is a ratio of how much louder speech is than noise. The SNR may be adversely impacted when speech energy (i.e., the signal) fluctuates over a period of time. The fluctuation of the speech energy can be caused by changes of intensity and sequences of words and pauses.
Additionally, stationary and dynamic noises may be present in the audio environment. The SNR averages all of these stationary and non-stationary noises and speech. There is no consideration as to the statistics of the noise signal; only what the overall level of noise is.
In some prior art systems, a fixed classification threshold discrimination system may be used to assist in noise suppression. However, fixed classification systems are not robust. In one example, speech and non-speech elements may be classified based on fixed averages. However, if conditions change, such as when the speaker moves the microphone away from their mouth or noise suddenly gets louder, the fixed classification system will erroneously classify the speech and non-speech elements. As a result, speech elements may be suppressed and overall performance may significantly degrade.