In normal voice calls, a user sometimes speaks and sometimes listens. At this time, an inactive speech phase may appear in the call process. In normal cases, a total inactive speech phase of both parties in a call exceeds 50% of a total time length of voice coding of the two parties of the call. In the non-active speech phase, there is only a background noise, and there is generally no useful information in the background noise. With this fact, in the process of voice signal processing, an active speech and a non-active speech are detected through a Voice Activity Detection (VAD for short) algorithm and are processed using different methods respectively. Many voice coding standards, such as Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB for short), support the VAD function. In terms of efficiency, the VAD of these encoders cannot achieve good performance under all typical background noises. Especially in an unstable noise, these encoders have low VAD efficiency. For music signals, the VAD sometimes has error detection, resulting in significant quality degradation of the corresponding processing algorithm.