The human voice communicates meaning and identity simultaneously. Typically, an expressive human voice emphasizes syllables, phrases, even paragraphs, to clarify what is being said, and has unique voice characteristics that tell one who is speaking. One objective of speech synthesis can be to create synthesized speech that communicates the voice identity of the speaker and that speaks with rhythms, intonations, and articulations that are close to those of a human being.
Two known approaches for synthesizing speech, formant based and, concatenation of acoustic units from voice recordings, have shortcomings in this respect. While the concatenated approach using prerecorded speech units can provide a generally identifiable voice, it is usually unable to simultaneously provide expressive voice emphases and intonations that enhance the listener's understanding of the text being synthesized as speech.
U.S. Patent Application Publication No. 2008/0195391 to Marple et al. describes a hybrid speech synthesizer, method and use, which includes embodiments comprising a hybrid of the known formant and concatenation methods for synthesizing speech. As described, speech synthesizer embodiments can predict, locate, and concatenate wave forms in sequence to provide acoustic units for expressive utterances when a specified acoustic unit (or a close facsimile thereof) is found to exist in a database of acoustic units. When the predicted acoustic unit is not found, the synthesizer can manipulate acoustic wave data for an acoustic unit candidate that is close to the predicted values of the ideal candidate so as to create an ideal candidate, or a perceptually acceptable substitute.
U.S. patent application Ser. No. 12/188,763 to Nitisaroj et al. describes a method of automated text parsing and annotation for expressive prosodies that indicates how the text is to be pronounced which is useful in speech synthesis and voice recognition. Also described are the abilities of professional voice talents trained to produce expressive speech according to annotations for a particular prosody in terms of articulations, with desired pitches, amplitudes, and rates of speech.
The foregoing description of background art may include insights, discoveries, understandings or disclosures, or associations together of disclosures, that were not known to the relevant art prior to the present invention but which were provided by the invention. Some such contributions of the invention may have been specifically pointed out herein, whereas other such contributions of the invention will be apparent from their context. Merely because a document may have been cited here, no admission is made that the field of the document, which may be quite different from that of the invention, is analogous to the field or fields of the present invention.