1. Field of the Invention
The present invention relates to a voice activity detection apparatus and a voice activity detection method.
2. Related Background Art
Discontinuous transmission (DTX) is a technology commonly used in telephony services over the mobile and in telephony services over the Internet for the purpose of reducing transmission power or saving transmission bandwidth. In the DTX operation, inactive period in an input signal, such as silence and background noise, may be transmitted at lower bitrate compared with the bitrate for active period containing speech, music or special tones, or transmission may be stopped during such inactive period. Voice activity detection (VAD), which is one of the key components of DTX operation, decides whether the current period of the input signal to be encoded contains only inactive information or not.
For example, the VAD apparatus described in patent document 1 listed below uses an autocorrelation of an input signal by taking advantage of the periodicity in human voice. More specifically, this VAD apparatus computes a delay at which the maximum autocorrelation value of an input signal within an (pre-determined) interval is obtained, and classifies the input signal as active if the obtained delay falls in the range of the pitch period of human voice, and the input signal inactive if the obtained delay is out of that range.
Furthermore, the VAD apparatus described in non-patent document 1 listed below estimates a background noise from an input signal and decides whether the input signal is active or inactive based on the ratio of the input signal to the estimated noise (SNR). More specifically, this VAD apparatus computes a delay at which the maximum autocorrelation value of an input signal within a (pre-determined) interval is obtained, and a delay at which the maximum weighted autocorrelation value of the input signal is obtained, estimates a background noise level adapting the estimation method on the basis of the continuity of these delays (i.e., small variation of subsequent delays for a pre-determined period of time), thereupon decides that the input signal is active if the SNR is equal to or greater than a threshold adaptively computed based on the estimated background noise level, or that the input signal is inactive if the SNR is smaller than the threshold.
[Patent Document 1] Japanese Unexamined Patent Publication No. 2002-162982
[Non-patent Document 1] 3GPP TS 26.094 V3.0.0 (http://www.3gpp.org/ftp/Specs/html-info/26094.htm)