The disclosed embodiments relate generally to text-to-speech synthesis, and more particularly to techniques that enable a user to select, from among multiple languages, a language to be used for performing text-to-speech synthesis or conversion.
The process of converting language text to speech is typically referred to as text-to-speech synthesis or text-to-speech conversion. Due to the diversity of languages spoken by humans, various languages are available for performing text-to-speech conversion. A system that can perform text-to-speech conversion in multiple languages typically provides multiple language synthesizers, each language synthesizer configured to convert the text to speech in a particular language. For example, an English language synthesizer may be provided for converting text to English speech, a French language synthesizer may be provided for converting text to French speech, a Japanese language synthesizer may be provided for converting text to Japanese speech, and so on. Depending upon the particular language to be used for the speech, a language synthesizer corresponding to that particular language is used for performing the text-to-speech conversion.
For languages that share characters, such as English, French, and German, the same piece of text may be converted to speech using any of the synthesizers corresponding to these languages. However, since a synthesizer for a particular language uses pronunciation rules and sounds that are specific to that language, the speech output for the same piece of text will sound differently for different synthesizers. For example, the speech resulting from text-to-speech conversion using an English synthesizer for a piece of text may sound very different from speech resulting from using a French synthesizer for the same piece of text.
In conventional systems, a default language synthesizer is generally automatically selected for performing text-to-speech conversion as long as that default synthesizer can output speech for the text being converted. This may however produce speech results that are undesirable to the user. For example, if the text to be converted is in the French language and an English language synthesizer is the default synthesizer, then the output could be French spoken with a bad English accent.