A discontinuous transmission (DTX) system is a widely-applied voice communication system, where in a silence period of voice communication, a manner of discontinuously encoding and transmitting a voice frame can be used to reduce occupation of channel bandwidth, and meanwhile, adequate subjective call quality can still be ensured.
Voice signals may be usually classified into two types, namely, an active voice signal and a silence signal. The active voice signal refers to a signal including a call voice, and the silence signal refers to a signal not including a call voice. In the DTX system, the active voice signal is transmitted by using a continuous transmission method, and the silence signal is transmitted by using a discontinuous transmission method. The discontinuous transmission of the silence signal is implemented in the following manner: an encoder intermittently encodes and sends a special encoding frame, namely, a silence descriptor (SID) frame, where in the DTX system, none of any other signal frame is encoded between two adjacent SID frames. A decoder discretionarily generates, according to discontinuously-received SID frames, a noise that enables comfortable subjective hearing of a user. The comfort noise (CN) does not aim to accurately restore an original silence signal, but aims to satisfy a requirement of a decoder user on subjective hearing quality, and enable the user not to feel uncomfortable.
In order to obtain better subjective hearing quality at the decoder, quality of transition from an active voice band to a CN band is critical. To obtain smoother transition, one effective method is that: during transition from an active voice band to a silence band, the encoder does not transit to a discontinuous transmission state immediately, but additionally delays for a period of time. In this period of time, some silence frames at the beginning of the silence band are still considered as active voice frames and are continuously encoded and sent, that is, a hangover interval of continuous transmission is set. The advantage of this measure lies in that: the decoder can fully use a silence signal within the hangover interval to better estimate and extract a feature of the silence signal, so as to generate a better CN.
However, in the prior art, a hangover mechanism is not effectively controlled. A condition for triggering the hangover mechanism is relatively simple, that is, whether to trigger the hangover mechanism is determined by simply checking whether there are enough active voice frames to be continuously encoded and sent at the end of a voice activity; after the hangover mechanism is triggered, a hangover interval at a fixed length may be executed compulsorily. However, it is unnecessary that a hangover interval at a fixed length must be executed when there are enough active voice frames to be continuously encoded and sent, for example, when a background noise of a communication environment is stable, even if no hangover interval is set or a short hangover interval is set, the decoder can obtain a CN having better quality. Therefore, this mode of simply controlling the hangover mechanism causes waste of communication bandwidth.