This invention related to speech, and more particularly, to speech synthesis.
Harmonic models were found to be very good candidates for concatenative speech synthesis systems. These models are required to compress the speech database and to perform prosodic modifications where necessary and, finally, to ensure that the concatenation of selected acoustic units results in a smooth transition from one acoustic unit to the next. The main drawback of harmonic models is their complexity. High complexity is a significant disadvantage in real applications of a TTS system where it is desirable to run as many parallel channels are possible on inexpensive hardware. More than 80% of the execution time of synthesis that is based on harmonic models is spent on generating a synthetic (harmonic) signal of the form                               h          ⁡                      (            t            )                          =                              ∑                          k              =              1                        K                    ⁢                                    A              k                        ⁢                          cos              ⁡                              (                                                      k                    ⁢                                          xe2x80x83                                        ⁢                                          ω                      o                                        ⁢                    t                                    +                                      ϕ                    k                                                  )                                                                        (        1        )            
where       K    =                  (                              f            s                    /          2                )                    f        o              ,      f    s  
is the sampling frequency, f0 is the fundamental frequency of the desired harmonic signal in Hz., xcfx89o the fundamental frequency of the desired harmonic signal in radians, k is the harmonic number, amplitude coefficients Ak for fundamental xcfx89o are given, and so are the phase xcfx86k for fundamental xcfx89o.
There are a number of prior art approaches for generating the signal of equation (1). The straight-forward approach directly synthesizes each of the harmonics, multiplies the synthesized signal by the appropriate coefficient, shifts the appropriate phase offset, and adds the created signal to an accumulated sum. Although modern computers have programs for quickly evaluating trigonometric functions, creating the equation (1) signal is nevertheless quite expensive.
Another approach that can be taken employs an FFT. The FFT, however, creates a number of frequency bins that is a power of 2, but the number of harmonics may not be such a number. In such a case, the frequency bin that is closest to the desired frequency can be assigned but, of course, an error is generated. The bigger the size of the FFT, the smaller the error, but the bigger the size of the FFT the more processing is required (which takes resources; e.g., time).
Still another approach that can be taken is to employ recurrence equations. Trigonometric functions whose arguments form a linear sequence of the form
xcex8=xcex80+nxcex4 with n=0, 1, 2, . . . ,
are efficiently calculated by the following recurrence:
cos(xcex8+xcex4)=cos xcex8xe2x88x92[xcex1 cos xcex8+xcex2 sin xcex8]
sin(xcex8+xcex4)=sin xcex8+[xcex1 sin xcex8xe2x88x92xcex2 cos xcex8]
where xcex1 and xcex2 are the pre-computed coefficients   α  =      2    ⁢                  sin        2            ⁡              (                  δ          2                )            
xcex2=sin xcex4.
For each harmonic, k, the coefficients xcex1k and xcex4k have to be computed, where xcex4k=kxcfx89o. The above works adequately only when the increment xcex4 is small.
A fast and accurate method for generating a sampled version of the signal             h      ⁡              (        t        )              =                  ∑                  k          =          1                K            ⁢                        A          k                ⁢                  cos          ⁡                      (                                          k                ⁢                                  xe2x80x83                                ⁢                                  ω                  o                                ⁢                t                            +                              ϕ                k                                      )                                ,
is achieved by pre-computing, for each harmonic k a phase delay corresponding to xcfx86k, expressed in a number of sample delays, for each fundamental frequency xcfx89o, of interest, and storing the pre-computed values in memory. Also pre-computed and stored in memory are sample values of cos(kxcfx89ot) and coefficients Ak for each fundamental frequency xcfx89o of interest. In operation, a sample of h(t) is generated for a given a fundamental frequency by first setting an index k to 1, retrieving the phase delay value corresponding to the value of k and to the given fundamental frequency, subtracting it from a sample time index, t, that is multiplied by the value of k, and employing the subtraction result, expressed in a modulus related to the fundamental frequency, to retrieve a sample value of cosine cos(kxcfx89ot) for the given fundamental frequency. The retrieved sample is multiplied by a retrieved coefficient Ak corresponding to the value of k and to the given fundamental frequency, and placed in an accumulator. The value of k is incremented, and the process is repeated until the process completes for k=K.