1. Technical Field
The present disclosure relates to unified normalization in text-to-speech and automatic speech recognition and more specifically to using a “unified framework” lexicon for both text-to-speech and training of an automatic speech recognition system.
2. Introduction
Both text-to-speech (TTS) systems and automatic speech recognition (ASR) systems utilize lexica to accomplish specific goals. In a TTS system, text is converted to speech using a lexicon containing stress pattern information, where speech associated with specific symbols or letters has a pronunciation based on the stress patterns in the lexicon. The lexicon of an ASR system generally identifies multiple plausible interpretations of speech based on context, parts of speech, etc., then uses context to determine which plausible interpretation is the best version of the speech. However, because the ASR lexica associate a single speech pronunciation with multiple words and does not contain stress pattern information, and because TTS lexica do not associate a single pronunciation with multiple possible interpretations, TTS and ASR lexicons are generally isolated to their respective functions.