In normal voice calls, users sometimes talk, sometimes listen, at this time there will be voice inactive phases during a call process, and under normal circumstances, the total voice inactive phases of both parties in a call is more than 50% of the total voice coding time period. In the voice inactive phases, there is only background noise, and the background noise usually does not have useful information. Using this fact, in the audio signal processing procedure, the voice activity detection (VAD) algorithm is used to detect active voice and inactive voice, and different methods are used to process. Many modern voice coding standards such as AMR and AMR-WB support the VAD function. In terms of efficiency, the VAD of these encoders cannot achieve good performance in all typical background noises. Especially in the non-stationary noise, the VAD efficiencies of these encoders are relatively low. As for the music signals, these VADs sometimes have detection errors, resulting in noticeable decline in the qualities of corresponding processing algorithms. In addition, related VAD technologies have the problem of inaccurate judgment, for example, the detection of few frames before the voice section is not accurate in some VAD technologies, while in some VADs, the detection of a few frames after the voice section is not accurate.