In a communication system, especially in a wireless communication system or a mobile communication system, channel bandwidth is a rare resource. According to statistics, in a bi-directional call, the talk time for both parties of the call only accounts for about half of the total talk time, and the call in the other half of the total talk time is in a silence state. Because the communication system only transmits signals when people talk and stops transmitting signals in the silence state, but cannot assign bandwidth occupied in the silence state to other communication services, which severely wastes the limited channel bandwidth resources.
To make full use of the channel resources, in the prior art, the time when the two parties of the call start to talk and when they stop talking are detected by using a VAD technology, that is, the time when the voice is activated is acquired, so as to assign the channel bandwidth to other communication services when the voice is not activated. With the development of the communication network, the VAD technology may also detect input signals, such as ring back tones. In a VAD system based on the VAD technology, it is usually judged that input signals are foreground signals or background noises according to a preset decision criterion that includes decision parameters and decision logics. Foreground signals include voice signals, music signals, and Dual Tone Multi Frequency (DTMF) signals, and the background noises do not include the signals. Such judgment process is also called VAD decision.
At the early stage of the development of the VAD technology, a static decision criterion is adopted, that is, no matter what the characteristics of an input signal are, the decision parameters and decision logics of the VAD remain unchanged. For example, in the G.729 standard-based VAD technology, regardless of the type of the input signal, the Signal to Noise Ratio (SNR) is, and the characteristics of the background noise, the same group of decision parameters are used to perform the VAD decision with the same group of decision logics and decision thresholds. Because the G.729 standard-based VAD technology is designed and presented based on a high SNR condition, the performance of the VAD technology is worse in a low SNR condition. With the development of the VAD technology, a dynamic decision criterion is proposed, in which the VAD technology can select different decision parameters and/or different decision thresholds according to different characteristics of the input signal and judge that the input signal is a foreground signal or background noise. Because the dynamic decision criterion is adopted to determine decision parameters or decision logics according to specific features of the input signal, the decision process is optimized and the decision efficiency and decision accuracy are enhanced, thereby improving the performance of the VAD decision. Further, if the dynamic decision criterion is adopted, different VAD outputs can be set for the input signal with different characteristics according to specific application demands. For example, when an operator hopes to transmit background information about some speakers in the VAD system to some extent, a VAD decision tendency can be set in the case that the background noise contains greater amount of information, so as to make it easier to judge that the background noise containing greater amount of information is also a voice frame. Currently, dynamic decision has been achieved in an adaptive multi-rate voice encoder (AMR for short). The AMR can dynamically adjust the decision threshold, hangover length, and hangover trigger condition of the VAD according to the level of the background noise in the input signal.
However, when the existing AMR performs the VAD decision, the AMR can only be adaptive to the level of the background noise but cannot be adaptive to fluctuation of the background noise. Thus, the performance of the VAD decision for the input signal owning different types of background noises may be quite different. For example, under the level of the same background noise, the AMR has much higher VAD decision performance in the case that the background noise is car noise, but the VAD decision performance is reduced significantly in the case that the background noise is babble noise, causing a tremendous waste of the channel bandwidth resources.