Text-to-speech (TTS) is a technology that converts computerized text into synthetic speech. The speech is produced in a voice that has predetermined characteristics, such as voice sound, tone, accent and inflection. These voice characteristics are embodied in a voice font. A voice font is typically made up of a set of computer-encoded speech segments having phonetic qualities that correspond to phonetic units that may be encountered in text. When a portion of text is converted, speech segments are selected by mapping each phonetic unit to the corresponding speech segment. The selected speech segments are then concatenated and output audibly through a computer speaker.
TTS is becoming common in many environments. A TTS application can be used with virtually any text-based application to audibly present text. For example, a TTS application can work with an email application to essentially “read” a user's email to the user. A TTS application may also work in conjunction with a text messaging application to present typed text in audible form. Such uses of TTS technology are particularly relevant to user's who are blind, or who are otherwise visually impaired, for whom reading typed text is difficult or impossible. More generally, TTS participates in the evolution toward computer natural user interfaces.
In some TTS systems, the user can choose a voice font from a number of pre-generated voice fonts. The available voice fonts typically include a limited set of voice patterns that are unrelated to the author of the text. The voice fonts available in traditional TTS systems are unsatisfactory to many users. Such unknown voices are not readily recognizable by the user or the user's family or friends. Thus, because these voices are unknown to the typical receiver of the message, these voice fonts do not add as much value or are as meaningful to the receiver's listening experience as could otherwise be achieved.
Additionally, there is no described method to support dynamic acquisition of PVFs to support reading out a text file using the author's voice.
The present invention provides a solution to these problems.