An arrangement which allows double talk in a full duplex communication channel is echo cancellation, where the outbound speech is cancelled at the inbound path. However, this arrangement is very expensive in terms of computation resources and thus is often not feasible.
At the other extreme, handling echoes in the vehicular environment is possible using the concept of an echo suppressor based on standard voice detection given a noisy background.
CCITT recommendation G.164 of 1988 Fascicle III.1 pages 186-205 describes generalised echo suppressors, among which a Type D echo suppressor is entirely digital and provides voice coding of a near-end voice and decoding (or synthesis) of a far-end voice. In such arrangements, "background speech" parameters are available if, for example, LPC type coding is performed as in many modern communication systems.
However, a conventional echo suppressor will not operate well in a full duplex situation where the loudspeaker power is comparable to the local user voice power, since its voice detection is based on the distinction between the speech power and/or its characteristics from those of the stationary background noise. Thus, either the loudspeaker voice will be detected by the VAD or the local user will be blocked when the far out VAD indicates activity.
Standard, present day VADs in the VSP context, are based on measuring the signal energy relative to background noise energy (Noise Riding Threshold (NRT)type detection). To implement such a detector, the VAD has to be able first to detect noise, estimate its instantaneous energy level, and subsequently detect speech if the signal energy exceeds a threshold (above the noise floor). In more advanced VADs the noise spectral characteristics are also estimated, and the energy out of the whitening inverse filter is used for the VAD. The process of distinguishing noise from voice is sometimes augmented with additional features extraction e.g. stationarity test and/or periodicity check (the noise being stationary and nonperiodic compared to speech).
GSM recommendation 06.32, 22 May 1989 is an example of energy based VAD where the energy is measured at the output of the inverse of the background noise shaping filter ("whitening" filter). The details, including the procedure for adapting the threshold and calculation of the filtered energy are given in that document.