The present invention relates generally to text-to-speech synthesis, and more particularly, to a method for customizing the speaking style of a speech synthesizer based on semantic analysis of the input text.
Text-to-speech synthesizer systems convert character-based text into synthesized audible speech. Text-to-speech synthesizer systems are used in a variety of commercial applications and consumer products, including telephone and voicemail prompting systems, vehicular navigation systems, automated radio broadcast systems, and the like.
Prosody refers to the rhythmic and intonational aspects of a spoken language. When a human speaker utters a phrase or sentence, the speaker will usually, and quite naturally, place accents on certain words or phrases, to emphasize what is meant by the utterance. In contrast, text-to-speech synthesizer systems can have great difficulty simulating the natural flow and inflection of the human-spoken phrase or sentence. Consequently, text-to-speech synthesizer systems incorporate prosodic analysis into the process of rendering synthesizer speech. Although prosodic analysis typically involves syntax assessments of the input text at a very granular level (e.g., at a word or sentence level), it does not involve a semantic assessment of the input text.
Therefore, it is desirable to provide a method for customizing the speaking style of a speech synthesizer based on semantic analysis of the input text.