1. Technical Field
The present invention relates to speech recognition, and more particularly a system and method for detecting disfluency.
2. Discussion of Related Art
Disfluency is common in speech. Detecting disfluency in speech can be useful for readability of speech transcripts as well as for further processing by natural language models such as summarization, machine translation or parsing.
There has been a significant amount of work in disfluency detection. Some of the disfluency detection systems have been built pertaining to DARPA EARS Rich Transcription program. Most of the disfluency detection systems that have been proposed use combinations of prosodic and lexical features though some systems are lexically driven without any use of acoustic features.
The addition of prosodic features to word based features has some advantages. For example, usually the intonation of a speaker is disrupted at the interruption point that indicates some form of restart. Another advantage of using prosodic features is its utility in disfluency detection for languages that lack adequate natural language tools.
Even though the use of combined lexical and prosodic features has some clear advantages, it should be noted that the prosodic features are not always easily available for some specific applications. Especially for online systems such as speech-to-speech translation any additional delay added for extra processing of speech signal to obtain various acoustic features may degrade the overall user experience.
Therefore, a need exists for a system and method for disfluency detection.