1. Field of the Invention
Embodiments of the present invention pertain to voice applications. More specifically, embodiments of the present invention pertain to automatic speech synthesis.
2. Related Art
Conventionally, techniques used for computer-based or computer-generated speech fall into a couple of broad categories. One such category includes techniques commonly referred to as text-to-speech (TTS). With TTS, text is “read” by a computer system and converted to synthesized speech. A problem with TTS is that the voice synthesized by the computer system is mechanical sounding and consequently not very lifelike.
Another category of computer-based speech is commonly referred to as a voice response system. A voice response system overcomes the mechanical nature of TTS by first recording, using a human voice, all of the various speech segments (e.g., individual words and sentence fragments) that might be needed for a message, and then storing these segments in a library or database. The segments are pulled from the library or database and assembled (e.g., concatenated) into the message to be delivered. Because these segments are recorded using a human voice, the message is delivered in a more lifelike manner than TTS. However, while more lifelike, the message still may not sound totally natural because of the presence of small but audible gaps between the concatenated segments.
Thus, contemporary concatenated recorded speech sounds choppy and unnatural to a user of a voice application. Accordingly, methods and/or systems that more closely mimic actual human speech would be valuable.