The present invention relates to a speech processing system of a variable frame length type vocoder and more particularly to improvements in reproduced speech quality.
A speech analysis and synthesis system called a "vocoder" is well known, which extracts feature parameters of an input speech signal for each frame, transmits them from an analysis side to a synthesis side with other speech information and then reproduces the speech signal by making use of the transmitted information.
A variable frame length type vocoder is also known which is capable of remarkably reducing the amount of transmission data. In this type vocoder, a plurality of frames are optimally approximated by at least one representative frame selected therefrom and the feature parameters of the representative frame and the number of frames to be replaced with the representative frame are transmitted. This vocoder is proposed by John M. Turner and Bradly W. Dickinson in a paper entitled "A Variable Frame Linear Predictive Coder", International Conference on Acoustics Speech and Signal Processing (ICASSP), 1978, pp. 454 to 457. An optimum rectangular approximation based on Dynamic Programming (DP) is reported by Katsunobu Fushikida in "A Variable Frame Rate Speech Analysis-Synthesis Method Using Optimum Square Wave Approximation", Acoustic Institute of Japan, May 1978, pp. 385 to 386. According to this technique, a predetermined number of frames are classified into a plurality of groups to minimize an error called residue distortion, between the approximated function and the envelope of the feature parameters based on rectangular approximation. The residue distortion may be expressed by space vector distance.
Further data reduction is attainable by a "pattern matching vocoder", which is disclosed in a report by Homer Dudley entitled "Phonetic Pattern Recognition Vocoder for Narrow-Band Speech Transmission", The Journal Of The Acoustical Society Of America, Vol. 30, No. 8, August, 1958, pp. 733 to 739, or a report by Raj Reddy and Robert Watkins: "Use Of Segmentation And Labelling In Analysis-Synthesis Of Speech", International Conference on Acoustics Speech and Signal Processing (ICASSP), 1977, pp. 28 to 32.
The system of the pattern matching vocoder comprises the steps of selecting the most similar reference pattern to an input feature parameter envelope pattern from among predetermined reference patterns by matching the input pattern with the respective reference patterns, and transmitting its label to the synthesis side with sound source information.
The variable frame length technique is also applicable to this pattern matching vocoder. In this vocoder, called a variable frame length type pattern matching vocoder, after determining the representative pattern from a plurality of frames the most similar reference pattern to the representative pattern is selected and then the label of the selected reference pattern is transmitted with a repeat bit indicating the number of frames to be replaced with the reference pattern. The optimum approximation is made by using rectangular and trapezoid functions on the basis of a DP matching method. The trapezoid function is comprised of a flat part and an inclination part as shown in copending and commonly assigned U.S. patent Ser. No. 544,198.
The above-described optimum approximation for each section, however, has the following shortcomings.
Since the representative frame finally selected in the preceding section and the first representative frame in the present frame are determined independently, a reduction of the approximation accuracy is unavoidable due to the lack of relation between the representative frames in the succeeding sections.
The optimum approximation by using the rectangular function also degrades the approximation accuracy, or the reproduced speech quality, due to "time distortion" which is caused by replacement of the continuous feature parameter envelope with the rectangular function.
Furthermore, the determination of the representative frame for the variable frame length process and the reference pattern for pattern matching process are carried out independently, thereby causing speech quality degradation. Here, a spectrum distortion caused by pattern matching is called "quantum distortion".