This invention relates to automatic speech synthesis.
Corpus-based concatenative speech synthesis has received much attention recently, and arguably provides the highest quality synthesized speech available today. In such synthesis systems, a large source corpus of speech is segmented and labeled according to speech units that are used to construct new utterances by concatenating the segments in a new order than that in which they were found in the corpus. In this way words and words sequences not present in the source corpus are synthesized by the system.
Much of the work to date has concentrated on techniques for processing waveform segments in order to smoothly concatenate them without introducing unnatural artifacts. Another area of work has considered approaches to making the search for segments to concatenate computationally efficient. Note that as the size of the corpus grows, the quality of the synthesized speech may increase due to the availability of better-matching segments, but the computational cost of finding those better-matching segments can grow unacceptably. Some work on segment selection has addressed methods of limiting the computation required for the search based on pre-processing of the source corpus.