This invention relates to a high-efficiency voice coding system and, particularly, to a high-quality speech transmission system operative with a smaller amount of information.
There have been widely known and practiced the PARCOR system and LSP system for efficiently coding the voice sound into information at less than 10 kbps. These systems, however, are not qualified enough to transmit a faint voice sound which barely allows the listener to identify the speaker. More sophisticated systems intended to enhance the above-mentioned ability include the Multi-pulse method offered by B. Atal, Bell Telephone Laboratories Inc. (B.S. Atal et al. "A New Model of LPC Excitation for producing Natural-Sounding Speech at Low Bit Rates", Proc. ICASSP 82 S5. 10, 1982), and the Thinned Residual method offered by the inventors of the present invention (A. Ichikawa et al., "A Speech Coding Method Using Thinned-out Residual", Proc. ICASSP 85, 25.7, 1985). However, at least a certain amount of information (around 8 kbps) is required to assure the sound quality reproduced, and it is difficult to compress information down to 2.0-2.4 kbps used by international data lines and the like.
Another method for drastically compressing voice information is the Vector Quantization method (e.g., S. Roucos et al., "Segment Quantization for Very-Low-Rate Speech Coding", Proc. ICASSP 82, p. 1563). This method, however, mainly deals with the information rate below 1 kbps and lacks in the clearness of reproduced voice sound. Although the combination of the Vector Quantization method with the above-mentioned Multi-pulse method is now under study, it is necessary for source information determining the fine structure of vectors to have considerable content, and therefore transmission of vocal audio signals qualified at above 10 kbps using an information content around 2 kbps is not feasible in the present state of art.
The voice sound is created by the mouth which is a physically restricted organ of the human body, and, when viewed from the physical characteristics of the voice sound, the parameters representing the physical characteristics of the voice sound take values eccentrically. Namely, the mouth is limited in the variation of shape, and therefore the range of vocal characteristics (e.g., sound spectrum) is also limited.
In the Vector Quantization method, the parametric space which the voice sound exists is partitioned into segments of a certain area, the segments are coded, and the vocal audio signal is transmitted in the form of codes. Methods such as the LPC method, in which the vocal signal is broken down into spectrum envelope information and fine structural information. Both types of information are transmitted in the form of codes and both types of codes are combined to reproduce the original voice sound in the receiver system. Both are reputed for their possibility of efficient compression for voice information and are applied to extensive purposes. Particularly, spectrum envelope information is confined in a certain range of attribute, allowing relatively simple approximation by combining of a few resonant and antiresonant characteristics, and is suitable for vector quantization.
There have been proposed several voice transmission methods in which fine structural information is regarded as the noise because of its resemblance in characteristics to the white noise, as described for example in G. Oyama et al., "A Stochstic Model of Excitation Source for Linear Prediction Speech Analysis-Synthesis", Proc. ICASSP 85, 25-2, 1985. However, this proposal is expected to deal with an amount of information of around 11.2 kbps only for the fine structure, and compression of information is not easy as mentioned previously.