In order to change the pitch of a speech, conventionally a pitch cycle of a speech signal that is a cyclical waveform is converted to a specific pitch cycle. Pitch Synchronous Overlap and Add (PSOLA) is a known method employed as pitch conversion processing to convert the pitch cycle of a speech signal, and PSOLA is widely implemented in the field of speech synthesis. In a PSOLA method, a pitch cycle is converted by cutting out speech signals at every pitch cycle of the speech signal using a window function with a length that is about twice a specific pitch cycle, rearranging the cut out speech signal at intervals of the specific pitch cycle, and weighting and overlapping the segments.
However, when a high pitched voice is synthesized using a PSOLA method, for example when a pitch cycle T of an original speech signal is converted to T/2 (0.5 times the pitch cycle), such as illustrated on the top row of FIG. 17, sometimes the amplitude of the speech signal is reduced after pitch cycle conversion, such as illustrated on the bottom row of FIG. 17. Moreover, in a case in which the phase signal of the original speech signal changes linearly, as illustrated on the top row of FIG. 18, an example of the phase signal of the speech signal after conversion is illustrated on the bottom row of FIG. 18 for when a pitch cycle T of an original speech signal is converted to T/2 (0.5 times the pitch cycle) using a PSOLA method. In such examples, non-continuous locations (phase signal jumps) occur in the phase signal in the vicinity of a central portion in each 1 pitch cycle of a phase signal of a speech signal that changes linearly.
Accordingly, in cases in which a PSOLA method is employed to convert a pitch cycle to a narrower pitch cycle (for example 1/1.5 or less), there is an issue that sometimes a deterioration in sound quality of the speech signal occurs after pitch cycle conversion due to a reduction in amplitude and jumps in phase
As a method to suppress deterioration in sound quality by a PSOLA method, a method is proposed in which pitch markers are appropriately determined to define the positions to cut out the speech signal, apply weighting and overlap when pitch cycle conversion processing is performed using a PSOLA method.
There is also a proposal for a speech analysis method in which amplitude data and phase data of an analyzing speech signal are derived, and a pulse train that is to be the sound source data is set on the time axis of the speech signal so as to correspond to the pitch cycle of the analyzing speech signal. In such a speech analysis method, the difference between phase data of the set pulse train and the phase data of the speech signal is employed as a 1 desired pitch cycle's worth of phase data in the analyzing speech signal.