Typical speech recognizers require an utterance detector to indicate where to start and to stop the recognition of the incoming speech stream. Most utterance detectors use signal energy as basic speech indicator. See, for example, J.-C. Junqua, B. Mak, and B. Reaves, “A robust algorithm for word boundary detection in the presence of noise,” IEEE Trans. on Speech and Audio Processing, 2(3):406–412, July 1994 and L. Lamels, L. Rabiner, A. Rosenberg, and J. Wilpon, “An improved endpoint detector for isolated word recognition,” IEEE ASSP Mag., 29:777–785, 1981.
In applications such as hands-free speech recognition in a car driven on a highway, the signal-to-noise ratio can be less than 0 db. That means that the energy of noise is about the same as that of the signal. Obviously, while speech energy gives good results for clean to moderately noisy speech, it is not adequate for reliable detection under such a noisy situation.