The present invention relates to a reference speech pattern generating method for generating from a learning speech reference patterns to be used for speech coding, speech recognition, text-to-speech synthesis for synthesizing a sentence into speech, or the like, where pattern matching is performed.
As a speech coding method using a pattern matching technique, a segment vocoder is proposed in ICASSP'82, Bolt Beranek and Newman Inc., "Segment Quantization for Very-Low-Rate Speech Coding". According to this method, as shown in FIG. 1, a speech signal from an input terminal 11 is converted into a time series of spectral patterns 12, which is divided into several segments S.sub.1, S.sub.2 and S.sub.3 of time lengths by spectral analysis and segmentation section 20, and each segment is coded in a quantization section 14 by matching with a reference pattern read out of a reference pattern memory 13.
In the coding methods of the type which processes the input speech in units of segments, it is commonly important to decide what method should be employed for each of (1) a segment dividing method, (2) a pattern matching method, and (3) a reference pattern generating method. The above-mentioned segment vocoder divides the input speech into variable length segments on the basis of its rate of spectral change for (1), performs spectral matching based on equal interval samplings of the trajectory in a spectral parameter space for (2), and generates reference patterns by a random learning for (3).
However, the segment vocoder employs different criteria for the segmentation and for the matching, and hence does not minimize, as a whole, the spectral distortion that gives a measure of the speech quality. Furthermore, since the spectral matching loses time information of spectral variations in each segment, the coded speech is accompanied by a spectral distortion. In addition, the reference pattern generating method in itself is heuristic and therefore the reference pattern for the variable length segment data is not optimum for reducing the spectral distortion. On this account, the prior art system cannot obtain sufficient intelligibility for a very low bit rate code around 200 b/s.