1. Field of the Invention
This invention relates to natural language processing in general, and more particularly, to methods and systems for speech synthesis and speech recognition.
2. Description of the Related Art
A speech synthesis system is a machine that accepts as input a text stream and provides as output a speech signal. One aspect of a speech synthesizer converts words into phonemes. A phoneme is a member of the set of the smallest units of speech that serve to distinguish one utterance from another in a language or dialect. The /p/ of pat and the /f/ of fat are two examples. Typically, the conversion from text to phonemes is performed either by looking the words up in a dictionary or by sounding them out from their orthography (e.g., spelling) according to set of phonetic principles. An excellent tutorial on the topic is D. H. Klatt, "Review of text-to-speech conversion for English," J. Acoust. Soc. Am., Vol. 82(3), pp. 737-775 (Sept. 1987).
Both approaches have their advantages and disadvantages; the dictionary approach provides the highest quality output but fails for words (e.g., proper nouns) which are not in the dictionary. The rule-based approach is more comprehensive in its coverage but produces unacceptable results for irregular words. Today, most speech synthesizers use both approaches. The dictionary approach is utilized when possible, and the rule-based approach is used when the dictionary approach fails.
A speech recognition system is a machine that performs the inverse function of a speech synthesis system. It accepts as input a speech signal and outputs a text stream representing that speech. One aspect of a speech recognition system converts phonemes or sequences of phonemes into words. As in contemporary speech synthesis systems, this conversion is usually performed using the dictionary approach when possible and the rule-based approach otherwise.