A text-to-speech system (TTS) is one of the human-machine interfaces using speech. TTSs, which can be implemented in software or hardware, convert normal language text into speech. TTSs are implemented in many applications such as car navigation systems, information retrieval over the telephone, voice mail, speech-to-speech translation systems, and comparable ones with a goal of synthesizing speech with natural human voice characteristics. Modem text to speech systems provide users access to multitude of services integrated in interactive voice response systems. Telephone customer service is one of the examples of rapidly proliferating text to speech functionality in interactive voice response systems.
Many systems employing a TTS engine require human-like voice output to speak static content (prompts). When the recording person is not available, a prompt generation tool is usually used to help generate such prompts. A prompt generation tool helps people to manipulate text-to-speech output to achieve better prosody, naturalness, etc. A common deficiency of these tools is the lack of ease of use and efficiency to get a satisfying result, because the representation of waveforms is hard to be understood by people with little or no speech synthesis background.