The Field of the Invention
The invention relates to methods and apparatus for synthesizing speech.
Description of Related Art
Rule-based speech synthesis is used for various types of speech synthesis applications including Text-To-Speech (TTS) and voice response systems. Typical rule-based speech synthesis techniques involve concatenating pre-recorded phonemes to form new words and sentences.
Previous concatenative speech synthesis systems create synthesized speech by using single stored samples for each phoneme in order to synthesize a phonetic sequence. A phoneme, or phone, is a small unit of speech sound that serves to distinguish one utterance from another. For example, in the English language, the phoneme /r/ corresponds to the letter “R” while the phoneme /t/ corresponds to the letter “T”. Synthesized speech created by this technique sounds unnatural and is usually characterized as “robotic” or “mechanical.”
More recently, speech synthesis systems started using large inventories of acoustic units with many acoustic units representing variations of each phoneme. An acoustic unit is a particular instance, or realization, of a phoneme. Large numbers of acoustic units can all correspond to a single phoneme, each acoustic unit differing from one another in terms of pitch, duration, and stress as well as various other qualities. While such systems produce a more natural sounding voice quality, to do so they require a great deal of computational resources during operation. Accordingly, there is a need for new methods and apparatus to provide natural voice quality in synthetic speech while reducing the computational requirements.