1. Field
This invention relates in general to signal processing. Specifically, this invention relates to systems and methods for varying sound characteristics used by text-to-speech engines.
2. General Background and Related Art
A human disc jockey (DJ) makes announcements between previous and subsequent audio programs, such as songs. A DJ may vary the sound characteristics of the DJ's voice during the announcement such that the announcement begins with sound characteristics consistent with those of the previous song and ends with sound characteristics consistent with those of the upcoming song. Smooth transitions between songs and DJ announcements may improve the audio effect upon the listener.
In the digital era, it is becoming increasingly popular to employ text-to-speech (TTS) engines to perform the conventional tasks of a human DJ. That is, a human DJ may be replaced by a synthetic DJ that synthesizes an audio announcement based on the text of the announcement. For example, titles of songs may be inputted, in textual form, to a TTS-based synthetic DJ. Based on given titles, the synthetic DJ may then generate appropriate audio signals for the announcement. Nevertheless, synthetic DJ technologies apply a constant set of sound characteristics when synthesizing announcements. This failure to consider context yields unnatural-sounding announcements.
Therefore, what is needed is a system and method that adjusts synthetic DJ sound characteristics depending on the context of an announcement.