Speech synthesis occurs when a speech synthesizer converts written language text into audible speech. There are several methods for performing speech synthesis. One example of a speech synthesizer is a concatenative speech synthesizer which may concatenate several pieces of pre-recorded speech to synthesize the sound of text being read aloud. Another example of a speech synthesizer is a statistical parametric synthesizer that may adapt various vocal parameters of a system (e.g. frequency spectrum, fundamental frequency, rhythm, stress, and intonation) to create an audio sample that mimics the sound of speech. Speech synthesizers often use a generic speech synthesis voice that may result in the audible speech sounding impersonal and artificial. Generating a customized speech synthesis voice for use by a speech synthesizer can be difficult. Typically, to create a customized speech synthesis voice, a user needs to spend a significant amount of time reading a lengthy prepared script to provide the sounds required for a machine to learn the enunciation of all words of a particular language. The sounds can then be concatenated together or used to alter the parameters of a statistical parametric synthesizer model, and serve as the basis for the custom speech synthesis voice.