A computing device (e.g., smartphone, tablet, phablet, laptop computer, desktop computer, smart tv, mobile gaming device, smart watch, smart glasses) is a device that can use a speech synthesizer to generate synthesized speech for use in audibly communicating with a user of the computing device. For example, the computing device may include a speech synthesizer that creates the synthesized speech by concatenating pieces of recorded speech that are stored in the computing device (e.g., stored in a database). Alternatively, the speech synthesizer of the computing device can incorporate a model of the vocal tract and other human voice characteristics (“voice model”) to create a completely synthetic voice output.
When a speech synthesizer uses recorded speech to generate the synthesized speech, a single voice (e.g., a single voice actor) is typically used to record the speech. Similarly, when a speech synthesizer uses the model approach to create a synthetic voice, the speech synthesizer typically only uses a single voice model. In situations where the speech synthesizer uses a database that stores speech recorded by using different voices (e.g., speech recorded by different voice actors or speech recorded by the same voice actor who can create different voices), as well as in situations where the speech synthesizer has multiple voice models, the user of the computing device may be able to select the voice (e.g., the voice model or voice actor) that the speech synthesizer will use to generate the speech that is used to communicate with the user. The selected voice is then used by the speech synthesizer in subsequent communications with the user. As such, the characteristics of the synthesized speech do not change dynamically over time. For example, all of the speech produced by the speech synthesizer may have the same voice characteristics (e.g., the same emotion, phrasing, intonation, tone).