The present invention relates to a linear prediction process, and corresponding apparatus, for reducing the redundance in the digital processing of speech in a system of the type wherein digitized speech signals are divided into sections and each section is analysed for model filter characteristics, sound volume and pitch.
Speech processing systems of this type, so-called LPC vocoders, afford a substantial reduction in redundance in the digital transmission of voice signals. They are becoming increasingly popular and are the subject of numerous publications and patents, examples of which include:
B. S. Atal and S. L. Hanauer, Journal Acoust. Soc. A., 50, p 637-655, 1971; PA0 R. W. Schafer and L. R. Rabiner, Proc. IEEE, Vol. 63, No. 4, p 662-667, 1975; PA0 L. R. Rabiner et al., Trans. Acoustics, Speech and Signal Proc., Vol. 24, No. 5, p. 399-418, 1976; PA0 B. Gold. IEEE Vol. 65, No. 12, p.1636-1658, 1977; PA0 A. Kurematsu et al., Proc. IEEE, ICASSP, Washington 1979, p. 69-72; PA0 S. Horwath, "LPC-Vocoders, State of Development and Outlook", Collected Volume of Symposium Papers "War in the Ether", No. XVII, Bern 1978; PA0 U.S. Pat. Nos. 3,624,302; 3,361,520; 3,909,533; 4,230,905.
The presently known and available LPC vocoders do not yet operate in a fully satisfactory manner. Even though the speech that is synthesized after analysis is in most cases relatively comprehensible, it is distorted and sounds artificial. One of the causes of this limitation, among others, is to be found in the difficulty in deciding with adequate safety whether a voiced or unvoiced section of speech is present. Further causes are the inadequate determination of the pitch period and the inaccurate determination of the parameters for a sound generating filter.
In addition to these fundamental difficulties, a further significant problem results from the fact that the data rate in many cases must be restricted to a relatively low value. For example, in telephone networks it is preferably only 2.4 kbit/sec. In the case of an LPC vocoder, the data rate is determined by the number of speech parameters analyzed in each speech section, the number of bits required for these parameters and the so-called frame rate, i.e. the number of speech sections per second. In the systems presently in use, a minimum of slightly more than 50 bits is needed in order to obtain a somewhat usable reproduction of speech. This requirement automatically determines the maximum frame rate. For example, in a 2.4 kbit/sec system it is approximately 45/sec. The quality of speech with these relatively low frame rates is correspondingly poor. It is not possible to increase the frame rate, which in itself would improve the quality of speech, because the predetermined data rate would thereby be exceeded. To reduce the number of bits required per frame, on the other hand, would involve a reduction in the number of the parameters that are used or a lessening of their resolution which would similarly result in a decrease in the quality of speech reproduction.