I. Field of the Invention
The present invention relates to speech processing. More particularly, the present invention relates to a noise suppression system and method for use in speech processing.
II. Description of the Related Art
Transmission of voice by digital techniques has become widespread, particularly in cellular telephone and personal communication system (PCS) applications. This, in turn, has created an interest in improving speech processing techniques. One area in which improvements are being developed is that of noise suppression techniques.
Noise suppression in a speech communication system generally serves the purpose of improving the overall quality of the desired audio signal by filtering environmental background noise from the desired speech signal. This speech enhancement process is particularly necessary in environments having abnormally high levels of ambient background noise, such as an aircraft, a moving vehicle, or a noisy factory.
One noise suppression technique is the spectral subtraction, or spectral gain modification, technique. Using this approach, the input audio signal is divided into frequency channels, and particular frequency channels are attenuated according to their noise energy content. A background noise estimate for each frequency channel is utilized to generate a signal-to-noise ratio (SNR) of the speech in the channel, and the SNR is used to compute a gain factor for each channel. The gain factor then determines the attenuation for the particular channel. The attenuated channels are recombined to produce the noise-suppressed output signal.
In specialized applications involving relatively high background noise environments, most noise suppression techniques exhibit significant performance limitations. One example of such an application is the vehicle speakerphone option to a cellular mobile communication system. The speakerphone option provides hands-free operation for the automobile driver. The hands-free microphone is typically located at a greater distance from the user, such as being mounted overhead on the visor. The distant microphone delivers a poor SNR to the land-end party due to road and wind noise conditions. Although the received speech at the land-end is usually intelligible, continuous exposure to such background noise levels often increases listener fatigue.
For a noise suppression system to function properly, it is important to accurately determine the SNR of speech. However, it is difficult to accurately determine the SNR for the speech signal because of the limitations of currently available noise detectors. Spectral subtraction techniques update the background noise estimate during periods when speech is absent. When speech is absent, the measured spectral energy is attributed to noise, and the noise estimate is updated based on the measured spectral energy. Therefore, it is important to distinguish between periods of speech and absence of speech in order to obtain an accurate noise energy estimate for computation of the SNR.
An exemplary technique for speech detection uses a voice metric calculator to perform the noise update decision. A voice metric is a measurement of the overall voice-like characteristics of the channel energy. First, raw SNR estimates are used to index a voice metric table to obtain voice metric values for each channel. The individual channel voice metric values are summed to create an energy parameter, which is compared with a background noise update threshold. If the voice metric sum meets or exceeds the threshold, then the signal is said to contain speech. If the voice metric sum does not meet the threshold, the input frame is deemed to be noise, and a background noise update is performed. However, for the case of a high background noise condition, a sudden background noise, or an increasing noise source, SNR measurements will be large, resulting in a high voice metric, which negates a noise estimate update.
A refinement to the voice metric calculator technique measures the channel energy deviation. This method assumes that noise exhibits constant spectral energy over time, while speech exhibits variable spectral energy over time. Thus, the channel energy is integrated over time, and speech is detected if there is substantial channel energy deviation, while noise is detected if there is little channel energy deviation. A speech detector which measures channel energy deviation will detect a sudden increase in the level of noise. However, the channel energy deviation method provides an inaccurate result when the input speech signal is of constant energy. Furthermore, for the case of an increasing noise source, changes in the input energy will cause the energy deviation to be large, negating a noise estimate update even though an update is necessary.
In addition to an accurate speech detector, the noise suppression system must appropriately adjust channel gains. Channel gains should be adjusted so that noise suppression is achieved without sacrificing the voice quality. One method of channel gain adjustment computes the gain as a function of the total noise estimate and the SNR of the speech signal. In general, an increase in the total noise estimate results in a lower gain factor for a given SNR. A lower gain factor is indicative of a greater attenuation factor. This technique imposes a minimum gain value to prevent excess attenuation of the channel gain when the total noise estimate is very high. By using a hard clamped minimum gain value, a tradeoff between noise suppression and voice quality is introduced. When the clamp is relatively low, noise suppression is improved but voice quality is degraded. When the clamp is relatively high, noise suppression is degraded but the voice quality is improved.
In order to provide an improved noise suppression system, the limitations of the current techniques for speech detection and channel gain computation need to be addressed. These problems and deficiencies are solved by the present invention in the manner described below.