The present invention relates generally to speech recognition, and, more particularly, to automatic identification of sentence boundaries.
Automatic Speech Recognition (ASR) has proven useful for a number of applications. Typically, the output of an ASR system is a stream of words, in particular in outputting text corresponding to audio files. Generally, automatic transcriptions, and sometimes manual transcriptions, of conversations do not contain any punctuation that indicates sentence boundaries. Also, punctuations in manual transcriptions are not put inserted in a consistent manner. However, many applications, such as information retrieval and natural language processing benefit from (or even require) a sentence structure. State of the art Natural Language Processing (NLP) tools, such as Parts of Speech (POS) taggers and syntactic parsers, require input to be a single sentence. Application of such NLP tools to textual data based on ASR output ends up with significant errors.
The inherent word recognition error of an ASR system and the presence of noise, such as repetitions, false starts, and filler words in conversational speech, make identifying structural information a more challenging task as compared to well-written text.