1. Technical Field
The present invention relates to speech synthesis and, in particular, to using prosodic information for speech synthesis. Still more particularly, the present invention provides a method, apparatus, and program for transmitting text messages for synthesized speech.
2. Description of Related Art
Speech synthesis systems convert text to speech for audible output. Speech synthesizers may use a plurality of stored speech segments and their associated representation (i.e., vocabulary) to generate speech by concatenating the stored speech segments. However, because no information is provided with the text as to how the speech should be generated, the result is typically an unnatural or robot sounding speech.
Some speech synthesis systems use prosodic information, such as pitch, duration, rhythm, intonation, stress, etc., to modify or shape the generated speech to sound more natural. In fact, voice characteristic information, such as the above prosodic information, may be used to synthesize the voice of a specific person. Thus, a person's voice may be recreated to “read” a text that the person did not actually read.
However, recreating a person's voice using voice characteristic information introduces a number of ethical issues. Once an individual's voice characteristics are extracted and stored, they may be used to speak a text the content of which the individual finds objectionable or embarrassing. When voice characteristics are transmitted for remote synthesis of speech, the person receiving the voice characteristics may not even know if the characteristics did indeed come from the appropriate individual.
Therefore, it would be advantageous to provide an improved speech synthesis system transmitting text messages and certifying voice characteristics profiles for synthesized speech.