A text-to-speech system (TTS) is one of the human-machine interfaces using speech. TTSs, which can be implemented in software or hardware, convert normal language text into speech. TTSs are implemented in many applications such as car navigation systems, information retrieval over the telephone, voice mail, speech-to-speech translation systems, and comparable ones with a goal of synthesizing speech with natural human voice characteristics. Modern text to speech systems provide users access to multitude of services integrated in interactive voice response systems. Telephone customer service is one of the examples of rapidly proliferating text to speech functionality in interactive voice response systems.
Speech synthesizers are an integral part of interactive voice response systems. Quality improvement in speech synthesizers is a costly process. It requires production of audio, user participation, and user determination about audio quality. There is very minimal or no automation in identifying faults in a speech synthesizer. And, each production of a new speech synthesizer requires a new quality improvement process.
Additionally, a speech synthesizer has one or more algorithms that decide from multiple speech options which speech sounds best. However, improvements implemented in the algorithms currently require manual evaluation by having one or more users decide quality of the improvement. As synthesizers improve more, users are required to evaluate diminutive improvements leading to diminishing returns. As a result, as interactive voice response systems get better, it gets more cost prohibitive to improve quality.