Human-machine interfaces that employ language understanding systems are becoming increasingly popular. These systems are configured to recognize a command spoken by the user and provide an appropriate response to the command. One important characteristic of these systems, which affects the quality of the user experience, is latency (i.e., the length of time taken by the system to respond to the user input). One significant factor contributing to latency is the ability to detect the end of the user's speech utterance in connection with speech recognition. One approach to speech endpoint detection relies on voice activity detection based on the presence of signal energy. If voice activity (voice signal energy) is not detected for a predefined period of time, an assumption is made that the user has stopped talking. This technique does not work well, however, in noisy environments where background noise can be misrecognized as voice signal energy, resulting in a failure to detect the endpoint, which can cause the system to become unresponsive. Another approach to speech endpoint detection is to declare an endpoint after a predefined period of time has elapsed in which the best hypothesis of the speech recognizer has not changed. This technique, however, requires that the predefined period of time is long enough to guarantee that the longest possible phrase can be spoken, which also increases system latency.
Although the following Detailed Description will proceed with reference being made to illustrative embodiments, many alternatives, modifications, and variations thereof will be apparent in light of this disclosure.