1. Field of the Invention
The present invention generally relates to speech processing techniques and, more particularly, the invention relates to a method and apparatus for performing prosody-based speech processing.
2. Description of the Related Art
Speech processing is used to produce signals for controlling devices or software programs, transcription of speech into written words, extraction of specific information from speech, classification of speech into document categories, archival and late retrieval of such information, and other related tasks. All such speech processing tasks are faced with the problem of locating within the speech signal suitable speech segments for processing. Segmenting the speech signal simplifies the signal processing required to identify words. Since spoken language is not usually produced with explicit indicators of such segments, segmentation within a speech processor may occur with respect to commands, sentences, paragraphs or topic units.
For example, a system that is controlled by voice commands needs to determine when a command uttered by a user is complete, i.e., when the system can stop waiting for further input and begin interpreting the command. The process used to determine whether a speaker has completed an utterance, e.g., a sentence or command, is known as endpointing. Generally, endpointing is performed by measuring the length of a pause in the speech signal. If the pause is sufficiently long, the endpointing process deems the utterance complete. However, endpointing processes that rely on pause duration are fraught with errors. For example, many times a speaker will pause in mid-sentence while thinking about what is to be said next. An endpointing process that is based upon pause sensing will identify such pauses as occurring at the end of a sentence, when that is not the case. Consequently, the speech recognition processing that is relying upon accurate endpointing will erroneously process the speech signal.
Therefore, there is a need in the art for a method and apparatus that accurately identifies an endpoint in a speech signal.