Text-to-speech (TTS) is a technology that converts ASCII text into synthetic speech. The speech is produced in a voice that has predetermined characteristics, such as voice sound, tone, accent and inflection. These voice characteristics are embodied in a voice font. A voice font is typically made up of a set of computer-encoded speech segments having phonetic qualities that correspond to phonetic units that may be encountered in text. When a portion of text is converted, speech segments are selected by mapping each phonetic unit to the corresponding speech segment. The selected speech segments are then concatenated and output audibly through a computer speaker.
TTS is becoming common in many environments. A TTS application can be used with virtually any text-based application to audibly present text. For example, a TTS application can work with an email application to essentially “read” a user's email to the user. A TTS application may also work in conjunction with a text messaging application to present typed text in audible form. Such uses of TTS technology a reparticularly relevant to user's who are blind, or who are otherwise visually impaired, for whom reading typed text is difficult or impossible.
In traditional TTS systems, the user can choose a voice font from a number of pre-generated voice fonts. The available voice fonts typically include a limited set of female and/or male voices that are unknown to the user. The voice fonts available in traditional TTS systems are unsatisfactory to many users. Such unknown voices are not readily recognizable by the user or the user's family or friends. Thus, because these voices are unknown to the typical user, these voice fonts do not add as much value or be as meaningful to the user's listening experience as could otherwise be achieved.