The present invention relates to methods and computerized systems for providing synthesized or artificial speech, typically from a text input, employing novel prosodic speech text codes.
Synthesized, artificial or machine speech has many useful applications, for example, in voice mail systems, electronically enabled appliances, automobiles, computers, robotic assistants, games and the like, in spoken books and magazines, drama and other entertainment. The present invention extends to implementation in any such systems, as will be apparent from the disclosure hereinbelow.
Useful known systems for generating artificial speech are generally described as concatenated systems or formant systems. Concatenated artificial speech systems may be used for example in interactive voice mail systems and employ prerecorded complete phrases or sentences to yield a tolerably human, speech sound. However such systems are not suitable for the conversion to speech of extensive tracts of unknown text such as magazine articles or books. Formant systems which synthesize small slices of vocal or voice-like sounds “on the fly” as the text is machine read or otherwise processed by the computerized system, are more suitable for such larger tracts. However, until recently the output of such formant speech systems was notoriously mechanical, monotonous or machine-like.
Stevens U.S. Pat. No. 5,748,838 assigned to Sensimetrics Corporation (Cambridge, Mass.) discloses a speech synthesizing method which uses glottal modeling to determine and transform ten or fewer high level parameters into thirty-nine low level parameters using mapping relations. These parameters are inputted to a speech synthesizer to enable speech to be synthesized more simply than with prior art systems that required 50 to 60 parameters to be inputted to represent any particular speech. While the Stevens disclosure may be useful for its intended purposes, the somewhat mechanistic modeling of the vocal anatomy employed by Stevens, does not yield a speech output having appealing humanistic qualities. Nor does Stevens provide or suggest a means for adding desirable prosody or of controlling and modifying the prosody of synthetically or artificially generated speech.
As described in commonly owned Addison et al. U.S. Pat. No. 6,847,931, copending U.S. patent application Ser. No. 10/334,658, (“Addison '658”) and international patent publication number WO/2003/065349 text to be synthesized may be marked up with speech training notations as a pronunciation guide for intelligibility. Addison '658 provides for expressive parsing in speech synthesis and employs trained speakers to generate speech element databases that can be utilized to implement expressive synthesis of speech from text. Neither the Lessac system nor other known systems provides a simple method for communicating desired prosody to a speech synthesizer in a manner that permits control of the prosody of the output speech.
Good American Speech by Margaret Prendergast McLean E.P. Dutton & Co., Inc. (1952) “McLean” hereinafter, describes a system of notations for marking text to instruct the reader as to desired intonation patterns, or changes of pitch during connected speech, that should be adopted to avoid faults such as monotony or peculiar or dialectical intonation. This work preceded modern attempts to computerize speech and nothing in the art suggests any usefulness of the McLean intonation patterns to solve present-day problems in synthesizing speech. Furthermore, McLean's intonation patterns lack any means of referencing pitch, making it difficult for different speakers to utilize the intonation patterns in a consistent manner.
The foregoing description of background art may include insights, discoveries, understandings or disclosures, or associations together of disclosures, that were not known to the relevant art prior to the present invention but which were provided by the invention. Some such contributions of the invention may have been specifically pointed out herein, whereas other such contributions of the invention will be apparent from their context. Merely because a document may have been cited here, no admission is made that the field of the document, which may be quite different from that of the invention, is analogous to the field or fields of the present invention.