1. Field of the Invention
The present invention relates to a method and an apparatus for determining the endpoints of a speech utterance, and more specifically to such a method and an apparatus which feature an accurate detection of the beginning and end of an input speech signal especially with a low signal-to-noise ratio.
2. Description of the Prior Art
An important problem in speech processing is to detect the presence of speech in a background of noise. This problem is often referred to as the endpoint location problem. By accurately detecting the beginning and end of an utterance, the amount of processing of speech data can be kept to a minimum.
A known approach to locating the endpoints of a speech utterance is to compare a whole power (or a proportional value of the whole power) of an input speech signal with a threshold level. The beginning is determined when the whole power of the input speech signal exceeds the threshold. On the other hand, when the whole power falls below the threshold for more than a predetermined time interval, the time point at which the whole power intersects the threshold is deemed as the end point. This prior art however, has encountered a problem that if white noise is superimposed on the input speech signal, accurate detections of the endpoints are not expected due to the decreased signal-to-noise ratio. This prior art is described in "IEEE Transactions on Acoustics, Speech, and signal processing, Vol., ASSP-22, No. 5, October 1974" entitled "A Parametrically Controlled Spectral Analysis System for Speech", and also in "The Bell System Technical Journal, Vol. 54, No. 2, February 1975" entitled "An Algorithm for Determining the Endpoints of Isolated Utterances".