A waveform of a speech is indicated by a time on a horizontal axis and an amplitude on the vertical axis.
A waveform of a speech is prepared for each segment based on previously-recorded speaker's speech for speech synthesis. Waveforms of segments according to a speech to be output are coupled thereby to acquire a synthesis speech.
A waveform of a speech of each segment is cut out at a pitch cycle. The cut-out waveform is called pitch waveform. A pitch waveform is cut out from the waveform of one segment at the pitch cycle, and a plurality of pitch waveforms are generated per segment. The pitch cycle is the reciprocal of a pitch frequency (fundamental frequency).
As a method for eliminating unbalanced power of a synthesis speech, there is considered a method for performing a compression processing on a recorded speech or synthesis speech. FIG. 11 is a schematic diagram illustrating an exemplary compression processing on a waveform of a speech. A power envelope of a waveform 91 of a speech before being subjected to the compression processing can be schematically expressed as in a power envelope 92. The power envelope of the waveform of the speech looks like a power envelope 93 by the compression processing.
PLT 1 describes a speech synthesis device therein. The speech synthesis device described in PLT 1 performs a waveform normalization processing as described below. That is, the speech synthesis device described in PLT 1 takes out an 1-pitch waveform. Assuming the waveform as x[i] (i=1, . . . , N), an average amplitude Px is expressed as in Equation (1).
                    [                  Math          .                                          ⁢          1                ]                                                                      P          X                =                                            1              N                        ⁢                          {                                                ∑                                      i                    =                    1                                    N                                ⁢                                                      (                                          X                      ⁡                                              [                        i                        ]                                                              )                                    2                                            }                                                          Equation        ⁢                                  ⁢                  (          1          )                    
The speech synthesis device described in PLT 1 calculates Equation (2) described later assuming a predetermined value A, thereby to acquire normalized waveform information S[i].S[i]=X[i]×A/Px  Equation (2)