Speech synthesis is the artificial production of human speech by a machine such as a computer system programmed with software that can produce speech based on data processing. A computer system used for this purpose is called a speech synthesizer, and such systems can be implemented in software or hardware. A conventional text-to-speech (TTS) system converts normal written language text into speech that can be played through a speaker system for audible listening by a person. Other systems render symbolic linguistic representations like phonetic transcriptions into speech. Some conventional TTS systems can create synthesized speech by concatenating pieces of recorded speech that are stored in a database. Alternatively, a conventional TTS synthesizer can incorporate a model of the vocal tract and other human voice characteristics to create a completely “synthetic” voice output. For example, in a method known as Statistical Parametric Synthesis, speech synthesis may be based on Hidden Markov Models (HMMs) in which the frequency spectrum (vocal tract), fundamental frequency (vocal source), and duration (prosody) of speech are modeled simultaneously. Hidden Markov Models also model the degree of voicing, which describes how sound is produced within the vocal tract. Typically, this includes a mix of voice sounds (i.e., those produced by the vibration of the vocal folds) and unvoiced sounds (i.e., those produced by turbulent air passing through a constriction in the vocal tract). A TTS system using parameters from HMMs generates speech waveforms using a speech synthesizer, such as a vocoder, from HMMs, themselves, based on various transforms. The speech waveforms can then be reproduced as sound via loudspeakers for human listening.