The present invention relates to word recognition for a speech recognition system and, more particularly, to word recognition without predetermining the endpoints of input words.
Speech recognition systems generally employ word recognition methods to match a word spoken by the system user to a "word template", i.e. a data set in the system representing a spoken word. Traditionally, one of the first steps performed by a typical speech recognition system is determining the endpoints of the speech utterance to define the word to be recognized. It is critical that these endpoints be determined accurately since incorrect endpoints cannot be compensated for later on during the recognition process.
Endpoint detection has proven to be a difficult task. In noisy environments, this difficulty is emphasized due to a lack of acoustic cues indicating a word's starting and ending points. Even in quiet environments, endpoint detection can be hampered because of speaker induced noises such as lip smacks, breathing noises, etc. Notwithstanding noise interference, continuous speech utterances may confuse the endpoint detector if a spoken word is part of a phrase or sentence. Accordingly, endpoint detection is a problem well recognized in the speech recognition field.
Although there have been other attempts of word spotting without predetermined endpoint detection, even those methods that have enjoyed some success have proven to be either computationally prohibitive or they have compromised the system's recognizer performance.