Diphone synthesis is one of the most popular methods used for creating a synthetic voice from recordings or samples of a particular person; it can capture a good deal of the acoustic quality of an individual, within some limits. The rationale for using a diphone, which is two adjacent half-phones, is that the “center” of a phonetic realization is the most stable region, whereas the transition from one “segment” to another contains the most interesting phenomena, and thus the hardest to model. The diphone, then, cuts the units at the points of relative stability, rather than at the volatile phone-phone transition, where so-called coarticulatory effects appear.
The invention herein disclosed presents an exemplary method and apparatus for diphone or concatenative synthesis when the computer system has insufficient or missing diphones.