1. Field of the Invention
The present invention relates to speech synthesizer systems, and more particularly to an interactive graphical user interface for controlling the acoustical characteristics of a synthesized voice.
2. Background of the Related Art
Most text-to-speech (TTS) systems allow users to alter the acoustical characteristics of a synthesized voice, thereby creating a new or modified synthesized voice. In text-to-speech systems, such as the well-known Bell Labs TTS system, the synthesized voice can be altered by manipulating speech parameters that control the acoustical characteristics of the synthesized voice. In the Bell Labs TTS system, the speech parameters are manipulated using escape sequences, which consist of ASCII codes that indicate to the Bell Labs TTS system the manner to alter one or more speech parameters. The following speech parameters are typically manipulable in a TTS system: pitch, rate, front and back head of the vocal tract, and aspiration.
By manipulating the speech parameters, acoustical characteristics of a base synthesized voice may be altered to create new voices or change intonations of utterances. To create specific voices or change the intonation of utterances, a user is often required to undergo a time consuming process of experimenting with various combinations of escape sequences corresponding to speech parameters before ascertaining whether a particular combination achieves the desired sound. Graphical user interfaces (GUIs) have been developed for TTS systems to facilitate this process of experimenting with various combinations of the escape sequences to create new voices.
Prior art TTS graphical user interfaces provide users with a mechanism for easy manipulation of speech parameters that control the acoustical characteristics of a synthesized voice, and creation or modification of a synthesized voice. Each word of a text subsequently converted into speech with the new or modified synthesized voice will possess the acoustical characteristics of the new or modified synthesized voice--that is, each word uttered by the synthesized voice will have the same pitch, rate, etc.
Human speakers often vary the acoustical characteristics of their voices such that certain words are emphasized or de-emphasized, perhaps giving different connotations to a phrase or sentence. The prior art TTS GUIs do not permit users to duplicate this human quality of tailoring the prosody of a text. Accordingly, there exist a need for a graphical user interface capable of permitting users to tailor the prosody of a text to be uttered by a text-to-speech system.