An important part to reduce bit rate for high performance speech encoders is the use of comfort noise instead of silence or lower bit rate for backgrounds. The key function that makes this possible is a voice activity detector (VAD), which enables the separation between speech and background noise.
Several types of voice activity detectors have been proposed and in TS 26.094, see reference [1], a VAD (herein named AMR VAD1) is disclosed and variations are disclosed in reference [3]. The core features of the AMR VAD1 are:                summing of sub-band Signal-to-Noise-Ratio (SNR) detector,        Threshold adaptation based on signal level,        background estimate adaptation based on previous decisions, and        deadlock recovery analysis for step increases in noise level.        
A drawback with the AMR VAD1 is that it is over-sensitive for some types of non-stationary background noise.
Another VAD (herein named EVRC VAD) is disclosed in C.S0014-A, see reference [2], as EVRC RDA and reference [4]. The main technologies used are:                split band analysis, wherein worst case band is used for rate selection in a variable rate speech codec.        adaptive noise hangover addition principle is used to reduce primary detector mistakes. Noise hangover adaptation is disclosed in reference [5], by Hong et al.        
A drawback with the split band EVRC VAD is that it occasionally makes bad decisions and shows too low frequency sensitivity.
Voice activity detection is disclosed by Freeman, see reference [6] wherein a VAD with independent noise spectrum is disclosed, and Barret, see reference [7], disclosed a tone detector mechanism that does not mistakenly characterize low frequency car noise for signalling tones. A drawback with solutions based on Freeman/Barret occasionally shows too low sensitivity (e.g. for background music).