Computers that possess natural language processing capabilities often have to determine which portions of a voice input include speech that is likely intended to be processed as a candidate natural language query, and which portions of the voice input include speech that is likely not intended to be processed as a natural language query. Such segmentation of the voice input may involve the use of an endpointer, which operates to isolate the likely starting point and/or the ending point of a particular utterance within the voice input.
Traditional endpointers, which evaluate the duration of pauses between words in designating when an utterance begins or ends, often output inaccurate results. These inaccuracies can result in a voice input being segmented incorrectly, and may further cause an utterance that was intended to be processed as a natural language query to not be processed, or may result in an utterance being processed inaccurately. For instance, consider the following conversation between two people, which is received as a voice input by a computer with natural language processing capabilities:
Speaker 1: “Computer, show me directions to Springfield.”
<short pause>
Speaker 2: “No, not Springfield, to Franklin.”
<long pause>
Speaker 1: “No, we want Springfield.”
Were the endpointer to isolate the likely starting points or ending points of these utterances based solely on the existence of long pauses between words, such a conversation might be improperly segmented as follows:
Input 1: “Computer, show me directions to Springfield, no, not Springfield, to Franklin.”
Input 2: “No, we want Springfield.”
A natural language processor might interpret the first input as a natural language query in which the speaker corrects themselves midway through the utterance, and may classify the second input as likely not a natural language query. Although the first speaker clearly intends for the computer to provide directions to “Springfield,” the incorrect operation of the endpointer may result in the natural language processor misinterpreting this intent, and instead obtaining and presenting directions to “Franklin.”