In a normal voice call, the user sometimes talks, and sometimes listens, in this case there is inactive speech stage during the call, and under normal circumstances, the whole inactive speech stage of both parties in a call exceeds 50% of the total speech-coding time length of the both parties in the call. In the inactive speech stage, there are only background noises, and usually there is no useful information in the background noises. By this fact, in an audio signal processing process, a voice activity detection (VAD) algorithm is used to detect active speech and inactive speech, and different methods are used for processing. Many modern speech-coding standards such as AMR and AMR-WB support the VAD function. In terms of efficiency, the VAD of these encoders cannot achieve good performance in all of the typical background noises. Especially in the non-stationary noises, the VAD efficiencies of these encoders are relatively low. As for music signals, these VADs sometimes will have wrong detections, resulting in a noticeable decline in the quality of the corresponding processing algorithm.