Field of the Invention
The present invention relates generally to speech recognition systems and, in particular, to a system for determining the location of isolated words within a speech signal.
Description of Related art
A wide variety of speech recognition systems have been developed. Typically, such systems receive a time-varying speech signal representative of spoken words and phrases. The speech recognition system attempts to determine the words and phrases within the speech signal by analyzing components of the speech signal. As a first step, most speech recognition systems must first isolate portions of the speech signal which convey spoken words from portions carrying silence. To this end, the systems attempt to determine the beginning and ending boundaries of a word or group of words within the speech signal. Accurate and reliable determination of the beginning and ending boundaries of words or sentences poses a challenging problem, particularly when the speech signal includes background noise.
A variety of techniques have been developed for analyzing a time-varying speech signal to determine the location of an isolated word or group of words within the signal. Typically, the intensity of the speech signal is measured. Portions of the speech signal having an intensity greater than a minimum threshold are designated as being "speech," whereas those portions of the speech signal having an intensity below the threshold are designated as being silent portions or "nonspeech." Unfortunately, such simple discrimination techniques have been unreliable, particularly where substantial noise is present in the signal. Indeed, it has been estimated that more than half of the errors occurring in a typical speech recognition system are the result of an inaccurate determination of the location of the words within the speech signal. To minimize such errors, the technique for locating isolated words within the speech signal must be capable of reliably and accurately locating the boundaries of the words, despite a high noise level. Further, the technique must be sufficiently simple and quick to allow for real time processing of the speech signal. Furthermore, the technique must be capable of adapting to a variety of noise environments without any a priori knowledge of the noise. The ability to accurately and reliably locate the boundaries of isolated words in any of a variety of noise environments is generally referred to as the robustness of the technique. Heretofore, a robust technique for accurately locating words within a speech signal has not been developed.