The following disclosure generally relates to information systems.
In general, conventional text-to-speech application programs produce audible speech from written text. The text can be displayed, for example, in an application program executing on a personal computer or other device. For example, a blind or sight-impaired user of a personal computer can have text from a web page read aloud from the personal computer. Other text to speech applications are possible including those that read from a textual database and provide corresponding audio to a user by way of a communication device, such as a telephone, cellular telephone or the like.
Speech from conventional text-to-speech applications typically sounds artificial or machine-like when compared to human speech. One reason for this result is that current text-to-speech applications often employ synthesis, digitally creating phonemes to be spoken from mathematical principles to mimic a human enunciation of the same. Another reason for the distinct sound of computer speech is that phonemes, even when generated from a human voice sample, are typically stitched together with insufficient context. Each voice sample is typically independent of adjacently played voice samples and can have an independent duration, pitch, tone and/or emphasis. When different words are formed that rely on the same phoneme as represented by text, conventional text-to-speech applications typically output the same phoneme represented as a voice sample. However, the resulting speech formed from the independent samples often sounds less than desirable.