1. Field of the Invention
The present invention relates to automatic speech recognition and more specifically to recognizing and translating speech.
2. Introduction
Automatic speech processing has advanced significantly but is still largely compartmentalized. For instance, automatic speech recognition typically transcribes speech orthographically and hence insufficiently captures context beyond words. Enriched transcription combines automatic speech recognition, speaker identification and natural language processing with the goal of producing richly annotated speech transcriptions that are useful both to human readers and to automated programs for indexing, retrieval and analysis. Some examples of enriched transcription include punctuation detection, topic segmentation, disfluency detection and clean-up, semantic annotation, pitch accent, boundary tone detection, speaker segmentation, speaker recognition, and annotation of speaker attributes. These meta-level tags are an intermediate representation of the context of the utterance along with the content provided by the orthographical transcription.
Accordingly, what is needed in the art is an improved way to enrich automatic speech translation with information beyond the text to be translated.