1. Field of the Invention
The present invention relates to the field of text-to-speech processing and, more particularly, to disambiguating text that is to be converted to speech using configurable lexeme based rules.
2. Description of the Related Art
One significant challenge in automatically converting text-to-speech (TTS) is handling ambiguous text constructs. Ambiguity can come in many forms, such as abbreviations, acronyms, and homographs. Numerous techniques exist for handling such ambiguous text constructs, though each technique contains a variety of drawbacks.
One conventional technique is to determine the part of speech of the text construct and to disambiguate it based upon this determination. While this is useful for ambiguous constructs that can be distinguished based on their part of speech, this technique cannot effectively handle constructs that do not have a common part of speech. Further, many text segments that are to be speech synthesized are not written in a grammatically precise manner, preventing an accurate determination of the part of speech. For example, text messages, conversational dialogues, and the like are often short, broken text segments, which do not perfectly conform to strict grammar rules.
Another disambiguation technique is to determine a dialog context or topic type and to use the dialog context to prefer various possible interpretations over others. The different possible text constructs are selectively mapped to different dialog contexts to resolve ambiguities. For example, the text construct “MS” can be disambiguated as an acronym for multiple sclerosis in a dialog context of medicine and can be disambiguated as an abbreviation for Mississippi in a dialog context of geography. However, it can be extremely difficult to foresee all the potential dialog contexts in which ambiguous text constructs can be used and to create suitable mappings.
Most conventional disambiguation techniques, such as the ones described above and hybrid solutions including aspects of the above techniques, are implemented using programmatic logic that is embedded within software code. This logic can be difficult, if not impossible, for a user to modify based upon usage considerations. Because of this, conventional disambiguation techniques have difficult coping with an addition of new terms to a vernacular (e.g., IPOD) and may not be situationally configurable.
From an implementation standpoint, conventional disambiguation techniques often handle different types of ambiguous text contracts in different ways and in different processing stages. For example, acronyms and abbreviations can be expanded during a pre-processing stage, which executes before homograph disambiguation occurs. A multi-stage processing technique can be time consuming, which is problematic for real-time speech processing, and can consume significant computing resources, which can be problematic for resource-constrained devices (e.g., smart phones, navigation systems, etc.). Further, a conventional staged disambiguation approach can inhibit competition among different types of ambiguities. For example, an acronym pre-processing stage can expand the text construct COD to mean cash on delivery without weighing the merits of interpreting COD as the word cod, a type of fish.