Voice activity detection (VAD) is a key technology widely used in fields such as voice communications and man-machine interaction. The VAD may also be referred to as sound activity detection (SAD). The VAD is used to detect whether there is an active signal in an input audio signal, where the active signal is relative to an inactive signal (such as environmental background noise and a mute voice). Typical active signals include a voice, music, and the like. A principle of the VAD is that one or more feature parameters are extracted from an input audio signal, one or more feature values are determined according to the one or more feature parameters, and then the one or more feature values are compared with one or more thresholds.
In the prior art, an active signal detection method based on a segmental signal-to-noise ratio (SSNR) includes: dividing an input audio signal into multiple sub-band signals on a frequency band, calculating energy of the audio signal on each sub-band, and comparing the energy of the audio signal on each sub-band with estimated energy of a background noise signal on each sub-band, so as to obtain a signal-to-noise ratio (SNR) of the audio signal on each sub-band; and then determining an SSNR according to a sub-band SNR of each sub-band, and comparing the SSNR with a preset VAD decision threshold, where if the SSNR exceeds the VAD decision threshold, the audio signal is an active signal, or if the SSNR does not exceed the VAD decision threshold, the audio signal is an inactive signal.
A typical method for calculating the SSNR is to add up all sub-band SNRs of the audio signal, and a result obtained is the SSNR. For example, the SSNR may be determined using formula 1.1:
                    SSNR        =                              ∑                          k              =              0                                      N              -              1                                ⁢                      snr            ⁡                          (              k              )                                                          Formula        ⁢                                  ⁢        1.1            
where k indicates the kth sub-band, snr(k) indicates a sub-band SNR of the kth sub-band, and N indicates a total quantity of sub-bands into which the audio signal is divided.
When the foregoing method for calculating the SSNR is used to detect an active voice, misdetection of an active voice may occur.