In a normal voice call, a user is sometimes talking, and sometimes listening. Under such a scenario, an inactive speech stage occurs in the call process. The total inactive speech stage of a calling party and a called party under normal circumstances occupies more than 50% of the total voice coding duration. In an inactive speech stage, there is only some background noise which usually does not have any useful information. In consideration of this fact, an active speech and a non-active speech are detected by means of a VAD algorithm in a voice signal processing procedure, and are processed using different methods respectively. Many voice coding standards currently adopted, such as an Adaptive Multiple Rate (AMR) and an Adaptive Multiple Rate-WideBand (AMR-WB), support the VAD function. In terms of efficiency, VAD of these coders cannot achieve good performance under all typical background noises. Specifically, the VAD efficiency of these coders is relatively low under an unstable noise circumstance. VAD may be wrong sometimes for a music signal, which greatly reduces the performance of a corresponding processing algorithm. In addition, the current VAD technologies have the problem of inaccurate judgment. For instance, some VAD technologies have relatively low detection accuracy when detecting several frames before a voice segment, and some VAD technologies have relatively low detection accuracy when detecting several frames after a voice segment.
An effective solution for the above problems has not been proposed yet.