As a method to convert a reproducing rate into an arbitrary rate without transforming a pitch of voice, PICOLA (Pointer Interval Control Overlap and Add) method is known. The principle of PICOLA method is introduced by "Time-Scale Modification Algorithm for Speech by Use of Pointer Interval Control Overlap and Add (PICOLA) and Its Evaluation" written by MORITA, Naotaka and ITAKURA, Fumitada in Proceeding of National Meeting of The Acoustic Society of Japan 1-4-14 (October, 1986).
And, the application of PICOLA method for voice signals divided into frames to convert a reproducing rate with fewer buffer memories is disclosed in Japanese unexamined patent publication No.8-137491.
FIG. 9 illustrates a block diagram of a conventional apparatus for converting a voice reproducing rate in PICOLA method. In the apparatus for converting a voice reproducing rate illustrated in FIG. 9, digitized voice signals are recorded in recording media 1, and framing section 2 fetches a voice signal in a frame of a predetermined length LF sample from recording media 1. The voice signal fetched by framing section 2 is provided into pitch period calculating section 6 along with stored in buffer memory 3 temporarily. Pitch period calculating section 6 calculates pitch period Tp of the voice signal to provide it into waveform overlapping section 4 along with storing a pointer of processing start position into buffer memory 3. Waveform overlapping section 4 overlaps waveforms of voice signals stored in buffer memory 3 using the pitch period of the input voice, then outputs the overlapped waveform into waveform synthesizing section 5. Waveform synthesizing section 5 synthesizes an output voice signal waveform from the voice signal waveform stored in buffer memory 3 and the overlapped waveform processed at waveform overlapping section 4 to provide the output voice.
In this apparatus for converting a voice reproducing rate, a reproducing rate is converted without transforming a pitch according to the process in the following.
First, a processing method for high rate reproducing is explained with FIG. 10 and FIG. 11. In the figures, P0 is a pointer indicating a head of a waveform overlap processing frame. In the waveform overlap processing, a processing frame is a LW sample with a length of two periods of voice pitch period Tp. And, when a rate of input voice is 1 and a desired reproducing rate is given r, L is the number of samples given by the following formulation. EQU L=Tp{1/(r-1)} (1)
L is a sample corresponding to a length of output waveform (c), and an input voice of Tp+L sample is reproduced as an output voice of L sample as mentioned later. Accordingly, r=(Tp+L)/L is given, then the formulation (1) is introduced.
An input voice fetched from recording media 1 by framing section 2 is stored in buffer memory 3. Concurrently, pitch period calculating section 6 calculates pitch period Tp of the input voice to input it to waveform overlapping section 4. And, pitch period calculating section 6 calculates L from pitch period Tp using the formulation (1), determines P0' that is a starting position for next processing and provides it into buffer memory 3 as a pointer in the buffer memory.
Waveform overlapping section 4 fetches a waveform of waveform overlap processing frame LW (=2Tp) sample from a processing starting point indicated by pointer P0 from buffer memory 3, decreases the first part of the processing frame (waveform A) in the time axis direction and increases the latter part of the processing frame (waveform B) in the time axis direction according to the the triangle window function, adds waveform A and waveform B, then calculates overlapped waveform c.
Waveform synthesizing section 5 removes the waveform of the waveform overlapping processing frame (waveform A+waveform B) from the input voice waveform and insert the overlapped waveform (waveform c) illustrated in FIG. 10 instead of the removed waveform. Then, input voice waveform D is added the overlapped waveform until P0' indicating a position of (P0+Tp+L) point (which is P1 indicating a position of a head+L point in waveform C on the synthesized waveform). In addition, P1 exists in waveform C when r&gt;2, in this case, waveform C is output until the position indicated by P1.
As a result, the length of synthesized output waveform (c) is L sample, then an input voice of Tp+L sample is reproduced as an output voice of L sample. Next waveform overlap processing is started from P0' point on the input waveform.
FIG. 11 illustrates the relation of voice signals stored in buffer memory 3 and framing by framing section 2 in the above processing explained using FIG. 10.
Originally, a buffer length necessary for the waveform overlap processing in buffer memory 3 is two periods of maximum pitch period Tp max of input voice. However, since input voice is divided into samples of a predetermined frame length LF to input, the processing starting position P0 locates at an arbitrarily position in the first frame of input voice and the buffer length should be an integer times of input frame length. Accordingly, the buffer length is the minimum value in multiples of LF over (LF+2Tp max). For instance, when the input frame length LF is 160 samples and the maximum value of pitch period Tp max is 145, the buffer length needs 3LF=480 samples.
In the processing in the buffer memory, the content of the buffer memory is shifted each time of input of LF sample and the waveform overlapping is processed only when the processing starting position P0 is entered in the first frame. In other time, input signals are provided as output signals without processing.
Next, a method for low rate reproducing is explained with FIG. 12.
As well as high rate reproducing, P0 is a pointer indicating a head of a waveform overlap processing frame. In the waveform overlap processing, a processing frame is a LW sample with a length of two periods of voice pitch period Tp. And, when a rate of input voice is 1 and a desired reproducing rate is given r, L is the number of samples given by the following formulation. EQU L=Tp{r/(1-r)} (2)
In the case of low rate reproducing, an input voice of L sample is reproduced as an output voice of Tp+L sample as mentioned later. Accordingly, r=L/(Tp+L) is given, then the formulation (2) is introduced.
Waveform overlapping section 4 increases the first part of the processing frame (waveform A) in the time axis direction, decreases the latter part of the processing frame (waveform B) in the time direction accordingly to the triangle window function, adds waveform A and waveform B, and calculates overlapped waveform c.
Waveform synthesizing section 5 inserts the overlapped waveform (waveform C) between waveform A and waveform B of the input signal waveform (a) illustrated in FIG. 12. Then, the input voice waveform B is added to the overlapped waveform until P0' indicating a position of (P0+L) point (which is P1 indicating a position of a head+L point of the waveform C on the synthesized waveform). When r&gt;0.5, P1 is not on input voice waveform B but exists on waveform D continued from the overlapped processing frame, in this case, waveform D is output until the position indicated by P0'.
As a result, the length of synthesized output waveform (C) is Tp+L sample, then an input voice of L sample is reproduced as an output voice of Tp+L sample. And, next waveform overlap processing is started from P0' point of the input waveform.
The relation of voice signals stored in buffer memory 3 and framing by framing section 2 is the same as that of high rate reproducing.
By the way, in the apparatus for converting a voice reproducing rate described above, a pitch period of input voice is obtained then the overlapping of waveform is executed on the basis of the pitch period. An input voice divided in the pitch period is called a pitch waveform, and since generally pitch waveforms have high similarity between each other, they are appropriate to use for waveform overlap processing.
However, if a calculation error occurs in a pitch period calculation the difference between neighboring pitch waveforms increases, which brings the problem that the quality of output voice after waveform overlapping decreases. As a primary cause to generate a calculation error of a pitch period, the following factors are considered. Generally, the calculated pitch period represents a certain interval of input voice (called pitch period analysis interval). When the pitch period varies drastically in the pitch period analysis interval, the defference between the calculated pitch period and the actual pitch period increases. Accordingly, to suppress the decreases of quality of output voice, it is necessary to obtain the most appropriate pitch waveform at the position of waveform overlap processing position.